Strange Long ParNew GC Pauses (Sample Code Included)

James Lent jlent at tivo.com
Thu Apr 30 15:01:08 UTC 2015


Here is a more frightening example:

Bad Case - 500M Static Object:

java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G -Xms6G LongParNewPause $((500*1024*1024)) 100 100
0.309: [GC0.309: [ParNew: 272640K->3028K(306688K), 0.0287780 secs] 784640K->515028K(6257408K), 0.0288700 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
0.372: [GC0.372: [ParNew: 275668K->7062K(306688K), 0.0228070 secs] 787668K->519062K(6257408K), 0.0228580 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
0.430: [GC0.430: [ParNew: 279702K->11314K(306688K), 0.0327930 secs] 791702K->523314K(6257408K), 0.0328510 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
0.497: [GC0.497: [ParNew: 283954K->15383K(306688K), 0.0336020 secs] 795954K->527383K(6257408K), 0.0336550 secs] [Times: user=0.08 sys=0.00, real=0.03 secs]
0.565: [GC0.565: [ParNew: 288023K->21006K(306688K), 0.0282110 secs] 800023K->533006K(6257408K), 0.0282740 secs] [Times: user=0.08 sys=0.00, real=0.03 secs]
0.627: [GC0.627: [ParNew: 293646K->26805K(306688K), 0.0265270 secs] 805646K->538805K(6257408K), 0.0266220 secs] [Times: user=0.07 sys=0.01, real=0.03 secs]
0.688: [GC0.688: [ParNew: 299445K->20215K(306688K), 1.3657150 secs] 811445K->535105K(6257408K), 1.3657830 secs] [Times: user=3.97 sys=0.01, real=1.36 secs]
2.087: [GC2.087: [ParNew: 292855K->17914K(306688K), 6.6188870 secs] 807745K->535501K(6257408K), 6.6189490 secs] [Times: user=19.71 sys=0.03, real=6.61 secs]
8.741: [GC8.741: [ParNew: 290554K->17433K(306688K), 14.2495190 secs] 808141K->537744K(6257408K), 14.2495830 secs] [Times: user=42.34 sys=0.10, real=14.25 secs]
23.025: [GC23.025: [ParNew: 290073K->17315K(306688K), 21.1579920 secs] 810384K->540348K(6257408K), 21.1580510 secs] [Times: user=70.10 sys=0.08, real=21.16 secs]
44.216: [GC44.216: [ParNew: 289955K->17758K(306688K), 27.6932380 secs] 812988K->543511K(6257408K), 27.6933060 secs] [Times: user=103.91 sys=0.16, real=27.69 secs]
71.941: [GC71.941: [ParNew: 290398K->17745K(306688K), 35.1077720 secs] 816151K->546225K(6257408K), 35.1078600 secs] [Times: user=130.86 sys=0.10, real=35.11 secs]
107.081: [GC107.081: [ParNew: 290385K->21826K(306688K), 41.4425020 secs] 818865K->553022K(6257408K), 41.4425720 secs] [Times: user=158.25 sys=0.31, real=41.44 secs]
148.555: [GC148.555: [ParNew: 294466K->21834K(306688K), 45.9826660 secs] 825662K->555757K(6257408K), 45.9827260 secs] [Times: user=180.91 sys=0.14, real=45.98 secs]
194.570: [GC194.570: [ParNew: 294474K->21836K(306688K), 51.5779770 secs] 828397K->558485K(6257408K), 51.5780450 secs] [Times: user=204.05 sys=0.20, real=51.58 secs]
246.180: [GC246.180: [ParNew^C: 294476K->18454K(306688K), 58.9307800 secs] 831125K->557829K(6257408K), 58.9308660 secs] [Times: user=232.31 sys=0.23, real=58.93 secs]
Heap
 par new generation   total 306688K, used 40308K [0x000000067ae00000, 0x000000068fac0000, 0x000000068fac0000)
  eden space 272640K,   8% used [0x000000067ae00000, 0x000000067c357980, 0x000000068b840000)
  from space 34048K,  54% used [0x000000068b840000, 0x000000068ca458f8, 0x000000068d980000)
  to   space 34048K,   0% used [0x000000068d980000, 0x000000068d980000, 0x000000068fac0000)
 concurrent mark-sweep generation total 5950720K, used 539375K [0x000000068fac0000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 2435K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)

Good Case - 479M Static Object:

java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G -Xms6G LongParNewPause $((479*1024*1024)) 100 100
0.298: [GC0.298: [ParNew: 272640K->3036K(306688K), 0.0152390 secs] 763136K->493532K(6257408K), 0.0153450 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.346: [GC0.346: [ParNew: 275676K->7769K(306688K), 0.0193840 secs] 766172K->498265K(6257408K), 0.0194570 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
0.398: [GC0.398: [ParNew: 280409K->11314K(306688K), 0.0203460 secs] 770905K->501810K(6257408K), 0.0204080 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.450: [GC0.450: [ParNew: 283954K->17306K(306688K), 0.0222390 secs] 774450K->507802K(6257408K), 0.0223070 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.504: [GC0.504: [ParNew: 289946K->18380K(306688K), 0.0169000 secs] 780442K->508876K(6257408K), 0.0169630 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
0.552: [GC0.552: [ParNew: 291020K->26805K(306688K), 0.0203990 secs] 781516K->517301K(6257408K), 0.0204620 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.604: [GC0.604: [ParNew: 299445K->21153K(306688K), 0.0230980 secs] 789941K->514539K(6257408K), 0.0231610 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.659: [GC0.659: [ParNew: 293793K->29415K(306688K), 0.0170240 secs] 787179K->525498K(6257408K), 0.0170970 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
0.708: [GC0.708: [ParNew: 302055K->23874K(306688K), 0.0202970 secs] 798138K->522681K(6257408K), 0.0203600 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.759: [GC0.760: [ParNew: 296514K->26842K(306688K), 0.0238600 secs] 795321K->528371K(6257408K), 0.0239390 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
0.815: [GC0.815: [ParNew: 299482K->25343K(306688K), 0.0237580 secs] 801011K->529592K(6257408K), 0.0238030 secs] [Times: user=0.06 sys=0.01, real=0.02 secs]
0.870: [GC0.870: [ParNew: 297983K->25767K(306688K), 0.0195800 secs] 802232K->532743K(6257408K), 0.0196290 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
0.921: [GC0.921: [ParNew: 298407K->21795K(306688K), 0.0196310 secs] 805383K->531488K(6257408K), 0.0196960 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
0.972: [GC0.972: [ParNew: 294435K->25910K(306688K), 0.0242780 secs] 804128K->538329K(6257408K), 0.0243440 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
1.028: [GC1.028: [ParNew: 298550K->21834K(306688K), 0.0235000 secs] 810969K->536979K(6257408K), 0.0235600 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
1.083: [GC1.083: [ParNew: 294474K->26625K(306688K), 0.0188330 secs] 809619K->544497K(6257408K), 0.0188950 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
1.133: [GC1.133: [ParNew: 299265K->26602K(306688K), 0.0210780 secs] 817137K->547186K(6257408K), 0.0211380 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
1.185: [GC1.185: [ParNew: 299242K->26612K(306688K), 0.0236720 secs] 819826K->549922K(6257408K), 0.0237230 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
1.240: [GC1.241: [ParNew: 299252K->26615K(306688K), 0.0188560 secs] 822562K->552651K(6257408K), 0.0189150 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
1.291: [GC1.291: [ParNew: 299255K->26615K(306688K), 0.0195090 secs] 825291K->555378K(6257408K), 0.0195870 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
1.342: [GC1.342: [ParNew: 299255K->22531K(306688K), 0.0229010 secs] 828018K->554021K(6257408K), 0.0229610 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
1.396: [GC1.396: [ParNew: 295171K->24505K(306688K), 0.0265920 secs] 826661K->560810K(6257408K), 0.0266360 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
1.453: [GC1.453: [ParNew: 297145K->24529K(306688K), 0.0296490 secs] 833450K->563560K(6257408K), 0.0297070 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
1.514: [GC1.514: [ParNew: 297169K->27700K(306688K), 0.0259820 secs] 836200K->569458K(6257408K), 0.0260310 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
1.571: [GC1.572: [ParNew: 300340K->27666K(306688K), 0.0199210 secs] 842098K->572150K(6257408K), 0.0199650 secs] [Times: user=0.07 sys=0.01, real=0.02 secs]
1.623: [GC1.623: [ParNew: 300306K->27658K(306688K), 0.0237020 secs] 844790K->574868K(6257408K), 0.0237630 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]
1.678: [GC1.678: [ParNew: 300298K->31737K(306688K), 0.0237820 secs] 847508K->581674K(6257408K), 0.0238530 secs] [Times: user=0.08 sys=0.00, real=0.03 secs]
1.733: [GC1.733: [ParNew: 304377K->21022K(306688K), 0.0265400 secs] 854314K->573685K(6257408K), 0.0265980 secs] [Times: user=0.08 sys=0.00, real=0.02 secs]
1.791: [GC1.791: [ParNew: 293662K->25359K(306688K), 0.0249520 secs] 846325K->580748K(6257408K), 0.0250050 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
1.847: [GC1.847: [ParNew: 297999K->19930K(306688K), 0.0195120 secs] 853388K->581179K(6257408K), 0.0195650 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
1.898: [GC1.898: [ParNew: 292570K->20318K(306688K), 0.0233960 secs] 853819K->584294K(6257408K), 0.0234650 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
1.953: [GC1.953: [ParNew: 292958K->20415K(306688K), 0.0233530 secs] 856934K->587117K(6257408K), 0.0234130 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
2.007: [GC2.007: [ParNew: 293055K->20439K(306688K), 0.0301410 secs] 859757K->589868K(6257408K), 0.0302070 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
2.068: [GC2.068: [ParNew: 293079K->20445K(306688K), 0.0289190 secs] 862508K->592600K(6257408K), 0.0289690 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
^C2.129: [GC2.129: [ParNew: 293085K->29284K(306688K), 0.0218880 secs] 865240K->604166K(6257408K), 0.0219350 secs] [Times: user=0.09 sys=0.00, real=0.02 secs]
Heap
 par new generation   total 306688K, used 40135K [0x000000067ae00000, 0x000000068fac0000, 0x000000068fac0000)
  eden space 272640K,   3% used [0x000000067ae00000, 0x000000067b898a78, 0x000000068b840000)
  from space 34048K,  86% used [0x000000068d980000, 0x000000068f619320, 0x000000068fac0000)
  to   space 34048K,   0% used [0x000000068b840000, 0x000000068b840000, 0x000000068d980000)
 concurrent mark-sweep generation total 5950720K, used 574881K [0x000000068fac0000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 2435K [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)


Recently we started seeing long ParNewGC pauses when starting up Kafka
that were causing session timeouts:

[2015-04-24 13:26:23,244] INFO 0 successfully elected as leader
(kafka.server.ZookeeperLeaderElector)
2.111: [GC (Allocation Failure) 2.111: [ParNew:
136320K->10236K(153344K), 0.0235777 secs] 648320K->522236K(2080128K),
0.0237092 secs] [Times: user=0.03 sys=0.01, real=0.02 secs]
2.599: [GC (Allocation Failure) 2.599: [ParNew: 146556K->3201K(153344K),
9.1514626 secs] 658556K->519191K(2080128K), 9.1515757 secs] [Times:
user=18.25 sys=0.01, real=9.15 secs]
[2015-04-24 13:26:33,443] INFO New leader is 0
(kafka.server.ZookeeperLeaderElector$LeaderChangeListener)

After much investigation I found that the trigger was the allocation of
a 500M static object early in the Kafka startup code.  It of course
makes no sense (to me) that the allocation of a single large static
object in Old memory would impact ParNew collections, but, testing here
indicates that it does.  I have created an Oracle bug report, but, it is
still under investigation.

I have reproduced the problem with a simple application on several Linux
platforms including an EC2 instance (and last night a Win 7 laptop) and
the following JREs:

OpenJDK: 6, 7, and 8
Oracle: 7 and 8

Oracle 6 does not seem to have an issue.  All its ParNewGC times are small.

Here is the simple program that demonstrates the issue (at least
everywhere I have tried it):

import java.util.ArrayList;

public class LongParNewPause {

    static byte[] bigStaticObject;

    public static void main(String[] args) throws Exception {
        int bigObjSize   = args.length > 0 ? Integer.parseInt(args[0])
: 500 * 1024 * 1024;
        int littleObjSize = args.length > 1 ? Integer.parseInt(args[1])
: 100;
        int saveFraction  = args.length > 2 ? Integer.parseInt(args[2])
: 10;

        bigStaticObject = new byte[bigObjSize];

        ArrayList<byte[]> holder = new ArrayList<byte[]>();

        int i = 0;
        while (true) {
            byte[] local = new byte[littleObjSize];
            if (i++ % saveFraction == 0) {
                holder.add(local);
            }
        }
    }
}

I run it with the following options:

    -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx2G -Xms2G

Note that I have not seen the issue with 1G heaps.  4G heaps also
exhibit the issue (as do heaps as small as 1.2G).  Bigger heaps seem to
aggrevate the problem.

Here is some sample output:

0.321: [GC (Allocation Failure) 0.321: [ParNew:
272640K->27329K(306688K), 0.0140537 secs] 784640K->539329K(2063104K),
0.0141584 secs] [Times: user=0.05 sys=0.02, real=0.02 secs]
0.368: [GC (Allocation Failure) 0.368: [ParNew:
299969K->34048K(306688K), 0.7655383 secs] 811969K->572321K(2063104K),
0.7656172 secs] [Times: user=2.89 sys=0.02, real=0.77 secs]
1.165: [GC (Allocation Failure) 1.165: [ParNew:
306688K->34048K(306688K), 13.8395969 secs] 844961K->599389K(2063104K),
13.8396650 secs] [Times: user=54.38 sys=0.05, real=13.84 secs]
15.036: [GC (Allocation Failure) 15.036: [ParNew:
306688K->34048K(306688K), 0.0287254 secs] 872029K->628028K(2063104K),
0.0287876 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
15.096: [GC (Allocation Failure) 15.096: [ParNew:
306688K->34048K(306688K), 0.0340727 secs] 900668K->657717K(2063104K),
0.0341386 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
...

Even stranger is the fact that the problem seems to be limited to
objects in the range of about 480M to 512M.  Specifically:

 [ 503316465 , 536870384 ]

Values outside this range appear to be OK.  Playing with the other input
parameters (size of the small objects and the fraction that are
retained) can modify the system behavior:

- Increase the length of the pauses - I have seen pauses of over a
minute my testing.
- Cause the pause times to steadily increase over many ParNewGC cycles.

During the pauses the CPUs seem to peg 100% (no waiting on resources).

Anyone have any thoughts?  I would be very interested to know if anyone
else can reproduce the issue?  Other engineers here have been able to.

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

James Lent
Senior Software Engineer

Digitalsmiths
A TiVo Company

www.digitalsmiths.com<http://www.digitalsmiths.com/>
jlent at digitalsmiths.com<mailto:jlent at digitalsmiths.com>  | office 919.460.4747

________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20150430/fedab00a/attachment.html>


More information about the hotspot-gc-dev mailing list