Strange Long ParNew GC Pauses (Sample Code Included)

Mikael Gerdin mikael.gerdin at oracle.com
Thu Apr 30 15:51:12 UTC 2015


Hi James,

I did a quick perf inspection and saw a lot of time spent scanning the 
card table, so on a hunch I ran your sample program with
-XX:+BlockOffsetArrayUseUnallocatedBlock
and that appears to make the extreme pause times go away.

See https://bugs.openjdk.java.net/browse/JDK-8079128
and https://bugs.openjdk.java.net/browse/JDK-7040068

I don't have time to dig further into this just at this moment, and 
running with that flag can cause random crashes and is not recommended.

/Mikael

On 2015-04-30 17:01, James Lent wrote:
> Here is a more frightening example:
>
> Bad Case - 500M Static Object:
>
> java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G -Xms6G LongParNewPause
> $((500*1024*1024)) 100 100
> 0.309: [GC0.309: [ParNew: 272640K->3028K(306688K), 0.0287780 secs]
> 784640K->515028K(6257408K), 0.0288700 secs] [Times: user=0.08 sys=0.01,
> real=0.03 secs]
> 0.372: [GC0.372: [ParNew: 275668K->7062K(306688K), 0.0228070 secs]
> 787668K->519062K(6257408K), 0.0228580 secs] [Times: user=0.07 sys=0.00,
> real=0.03 secs]
> 0.430: [GC0.430: [ParNew: 279702K->11314K(306688K), 0.0327930 secs]
> 791702K->523314K(6257408K), 0.0328510 secs] [Times: user=0.08 sys=0.01,
> real=0.03 secs]
> 0.497: [GC0.497: [ParNew: 283954K->15383K(306688K), 0.0336020 secs]
> 795954K->527383K(6257408K), 0.0336550 secs] [Times: user=0.08 sys=0.00,
> real=0.03 secs]
> 0.565: [GC0.565: [ParNew: 288023K->21006K(306688K), 0.0282110 secs]
> 800023K->533006K(6257408K), 0.0282740 secs] [Times: user=0.08 sys=0.00,
> real=0.03 secs]
> 0.627: [GC0.627: [ParNew: 293646K->26805K(306688K), 0.0265270 secs]
> 805646K->538805K(6257408K), 0.0266220 secs] [Times: user=0.07 sys=0.01,
> real=0.03 secs]
> 0.688: [GC0.688: [ParNew: 299445K->20215K(306688K), 1.3657150 secs]
> 811445K->535105K(6257408K), 1.3657830 secs] [Times: user=3.97 sys=0.01,
> real=1.36 secs]
> 2.087: [GC2.087: [ParNew: 292855K->17914K(306688K), 6.6188870 secs]
> 807745K->535501K(6257408K), 6.6189490 secs] [Times: user=19.71 sys=0.03,
> real=6.61 secs]
> 8.741: [GC8.741: [ParNew: 290554K->17433K(306688K), 14.2495190 secs]
> 808141K->537744K(6257408K), 14.2495830 secs] [Times: user=42.34
> sys=0.10, real=14.25 secs]
> 23.025: [GC23.025: [ParNew: 290073K->17315K(306688K), 21.1579920 secs]
> 810384K->540348K(6257408K), 21.1580510 secs] [Times: user=70.10
> sys=0.08, real=21.16 secs]
> 44.216: [GC44.216: [ParNew: 289955K->17758K(306688K), 27.6932380 secs]
> 812988K->543511K(6257408K), 27.6933060 secs] [Times: user=103.91
> sys=0.16, real=27.69 secs]
> 71.941: [GC71.941: [ParNew: 290398K->17745K(306688K), 35.1077720 secs]
> 816151K->546225K(6257408K), 35.1078600 secs] [Times: user=130.86
> sys=0.10, real=35.11 secs]
> 107.081: [GC107.081: [ParNew: 290385K->21826K(306688K), 41.4425020 secs]
> 818865K->553022K(6257408K), 41.4425720 secs] [Times: user=158.25
> sys=0.31, real=41.44 secs]
> 148.555: [GC148.555: [ParNew: 294466K->21834K(306688K), 45.9826660 secs]
> 825662K->555757K(6257408K), 45.9827260 secs] [Times: user=180.91
> sys=0.14, real=45.98 secs]
> 194.570: [GC194.570: [ParNew: 294474K->21836K(306688K), 51.5779770 secs]
> 828397K->558485K(6257408K), 51.5780450 secs] [Times: user=204.05
> sys=0.20, real=51.58 secs]
> 246.180: [GC246.180: [ParNew^C: 294476K->18454K(306688K), 58.9307800
> secs] 831125K->557829K(6257408K), 58.9308660 secs] [Times: user=232.31
> sys=0.23, real=58.93 secs]
> Heap
>   par new generation   total 306688K, used 40308K [0x000000067ae00000,
> 0x000000068fac0000, 0x000000068fac0000)
>    eden space 272640K,   8% used [0x000000067ae00000,
> 0x000000067c357980, 0x000000068b840000)
>    from space 34048K,  54% used [0x000000068b840000, 0x000000068ca458f8,
> 0x000000068d980000)
>    to   space 34048K,   0% used [0x000000068d980000, 0x000000068d980000,
> 0x000000068fac0000)
>   concurrent mark-sweep generation total 5950720K, used 539375K
> [0x000000068fac0000, 0x00000007fae00000, 0x00000007fae00000)
>   concurrent-mark-sweep perm gen total 21248K, used 2435K
> [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
>
> Good Case - 479M Static Object:
>
> java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx6G -Xms6G LongParNewPause
> $((479*1024*1024)) 100 100
> 0.298: [GC0.298: [ParNew: 272640K->3036K(306688K), 0.0152390 secs]
> 763136K->493532K(6257408K), 0.0153450 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.346: [GC0.346: [ParNew: 275676K->7769K(306688K), 0.0193840 secs]
> 766172K->498265K(6257408K), 0.0194570 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 0.398: [GC0.398: [ParNew: 280409K->11314K(306688K), 0.0203460 secs]
> 770905K->501810K(6257408K), 0.0204080 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.450: [GC0.450: [ParNew: 283954K->17306K(306688K), 0.0222390 secs]
> 774450K->507802K(6257408K), 0.0223070 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.504: [GC0.504: [ParNew: 289946K->18380K(306688K), 0.0169000 secs]
> 780442K->508876K(6257408K), 0.0169630 secs] [Times: user=0.07 sys=0.01,
> real=0.02 secs]
> 0.552: [GC0.552: [ParNew: 291020K->26805K(306688K), 0.0203990 secs]
> 781516K->517301K(6257408K), 0.0204620 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.604: [GC0.604: [ParNew: 299445K->21153K(306688K), 0.0230980 secs]
> 789941K->514539K(6257408K), 0.0231610 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.659: [GC0.659: [ParNew: 293793K->29415K(306688K), 0.0170240 secs]
> 787179K->525498K(6257408K), 0.0170970 secs] [Times: user=0.07 sys=0.01,
> real=0.02 secs]
> 0.708: [GC0.708: [ParNew: 302055K->23874K(306688K), 0.0202970 secs]
> 798138K->522681K(6257408K), 0.0203600 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.759: [GC0.760: [ParNew: 296514K->26842K(306688K), 0.0238600 secs]
> 795321K->528371K(6257408K), 0.0239390 secs] [Times: user=0.07 sys=0.00,
> real=0.03 secs]
> 0.815: [GC0.815: [ParNew: 299482K->25343K(306688K), 0.0237580 secs]
> 801011K->529592K(6257408K), 0.0238030 secs] [Times: user=0.06 sys=0.01,
> real=0.02 secs]
> 0.870: [GC0.870: [ParNew: 297983K->25767K(306688K), 0.0195800 secs]
> 802232K->532743K(6257408K), 0.0196290 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 0.921: [GC0.921: [ParNew: 298407K->21795K(306688K), 0.0196310 secs]
> 805383K->531488K(6257408K), 0.0196960 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 0.972: [GC0.972: [ParNew: 294435K->25910K(306688K), 0.0242780 secs]
> 804128K->538329K(6257408K), 0.0243440 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 1.028: [GC1.028: [ParNew: 298550K->21834K(306688K), 0.0235000 secs]
> 810969K->536979K(6257408K), 0.0235600 secs] [Times: user=0.06 sys=0.00,
> real=0.03 secs]
> 1.083: [GC1.083: [ParNew: 294474K->26625K(306688K), 0.0188330 secs]
> 809619K->544497K(6257408K), 0.0188950 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 1.133: [GC1.133: [ParNew: 299265K->26602K(306688K), 0.0210780 secs]
> 817137K->547186K(6257408K), 0.0211380 secs] [Times: user=0.06 sys=0.00,
> real=0.02 secs]
> 1.185: [GC1.185: [ParNew: 299242K->26612K(306688K), 0.0236720 secs]
> 819826K->549922K(6257408K), 0.0237230 secs] [Times: user=0.07 sys=0.00,
> real=0.03 secs]
> 1.240: [GC1.241: [ParNew: 299252K->26615K(306688K), 0.0188560 secs]
> 822562K->552651K(6257408K), 0.0189150 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 1.291: [GC1.291: [ParNew: 299255K->26615K(306688K), 0.0195090 secs]
> 825291K->555378K(6257408K), 0.0195870 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 1.342: [GC1.342: [ParNew: 299255K->22531K(306688K), 0.0229010 secs]
> 828018K->554021K(6257408K), 0.0229610 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 1.396: [GC1.396: [ParNew: 295171K->24505K(306688K), 0.0265920 secs]
> 826661K->560810K(6257408K), 0.0266360 secs] [Times: user=0.07 sys=0.00,
> real=0.03 secs]
> 1.453: [GC1.453: [ParNew: 297145K->24529K(306688K), 0.0296490 secs]
> 833450K->563560K(6257408K), 0.0297070 secs] [Times: user=0.09 sys=0.00,
> real=0.03 secs]
> 1.514: [GC1.514: [ParNew: 297169K->27700K(306688K), 0.0259820 secs]
> 836200K->569458K(6257408K), 0.0260310 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 1.571: [GC1.572: [ParNew: 300340K->27666K(306688K), 0.0199210 secs]
> 842098K->572150K(6257408K), 0.0199650 secs] [Times: user=0.07 sys=0.01,
> real=0.02 secs]
> 1.623: [GC1.623: [ParNew: 300306K->27658K(306688K), 0.0237020 secs]
> 844790K->574868K(6257408K), 0.0237630 secs] [Times: user=0.08 sys=0.00,
> real=0.02 secs]
> 1.678: [GC1.678: [ParNew: 300298K->31737K(306688K), 0.0237820 secs]
> 847508K->581674K(6257408K), 0.0238530 secs] [Times: user=0.08 sys=0.00,
> real=0.03 secs]
> 1.733: [GC1.733: [ParNew: 304377K->21022K(306688K), 0.0265400 secs]
> 854314K->573685K(6257408K), 0.0265980 secs] [Times: user=0.08 sys=0.00,
> real=0.02 secs]
> 1.791: [GC1.791: [ParNew: 293662K->25359K(306688K), 0.0249520 secs]
> 846325K->580748K(6257408K), 0.0250050 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 1.847: [GC1.847: [ParNew: 297999K->19930K(306688K), 0.0195120 secs]
> 853388K->581179K(6257408K), 0.0195650 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 1.898: [GC1.898: [ParNew: 292570K->20318K(306688K), 0.0233960 secs]
> 853819K->584294K(6257408K), 0.0234650 secs] [Times: user=0.07 sys=0.00,
> real=0.03 secs]
> 1.953: [GC1.953: [ParNew: 292958K->20415K(306688K), 0.0233530 secs]
> 856934K->587117K(6257408K), 0.0234130 secs] [Times: user=0.07 sys=0.00,
> real=0.02 secs]
> 2.007: [GC2.007: [ParNew: 293055K->20439K(306688K), 0.0301410 secs]
> 859757K->589868K(6257408K), 0.0302070 secs] [Times: user=0.09 sys=0.00,
> real=0.03 secs]
> 2.068: [GC2.068: [ParNew: 293079K->20445K(306688K), 0.0289190 secs]
> 862508K->592600K(6257408K), 0.0289690 secs] [Times: user=0.09 sys=0.00,
> real=0.03 secs]
> ^C2.129: [GC2.129: [ParNew: 293085K->29284K(306688K), 0.0218880 secs]
> 865240K->604166K(6257408K), 0.0219350 secs] [Times: user=0.09 sys=0.00,
> real=0.02 secs]
> Heap
>   par new generation   total 306688K, used 40135K [0x000000067ae00000,
> 0x000000068fac0000, 0x000000068fac0000)
>    eden space 272640K,   3% used [0x000000067ae00000,
> 0x000000067b898a78, 0x000000068b840000)
>    from space 34048K,  86% used [0x000000068d980000, 0x000000068f619320,
> 0x000000068fac0000)
>    to   space 34048K,   0% used [0x000000068b840000, 0x000000068b840000,
> 0x000000068d980000)
>   concurrent mark-sweep generation total 5950720K, used 574881K
> [0x000000068fac0000, 0x00000007fae00000, 0x00000007fae00000)
>   concurrent-mark-sweep perm gen total 21248K, used 2435K
> [0x00000007fae00000, 0x00000007fc2c0000, 0x0000000800000000)
>
>
>> Recently we started seeing long ParNewGC pauses when starting up Kafka
>> that were causing session timeouts:
>>
>> [2015-04-24 13:26:23,244] INFO 0 successfully elected as leader
>> (kafka.server.ZookeeperLeaderElector)
>> 2.111: [GC (Allocation Failure) 2.111: [ParNew:
>> 136320K->10236K(153344K), 0.0235777 secs] 648320K->522236K(2080128K),
>> 0.0237092 secs] [Times: user=0.03 sys=0.01, real=0.02 secs]
>> 2.599: [GC (Allocation Failure) 2.599: [ParNew: 146556K->3201K(153344K),
>> 9.1514626 secs] 658556K->519191K(2080128K), 9.1515757 secs] [Times:
>> user=18.25 sys=0.01, real=9.15 secs]
>> [2015-04-24 13:26:33,443] INFO New leader is 0
>> (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
>>
>> After much investigation I found that the trigger was the allocation of
>> a 500M static object early in the Kafka startup code.  It of course
>> makes no sense (to me) that the allocation of a single large static
>> object in Old memory would impact ParNew collections, but, testing here
>> indicates that it does.  I have created an Oracle bug report, but, it is
>> still under investigation.
>>
>> I have reproduced the problem with a simple application on several Linux
>> platforms including an EC2 instance (and last night a Win 7 laptop) and
>> the following JREs:
>>
>> OpenJDK: 6, 7, and 8
>> Oracle: 7 and 8
>>
>> Oracle 6 does not seem to have an issue.  All its ParNewGC times are
>> small.
>>
>> Here is the simple program that demonstrates the issue (at least
>> everywhere I have tried it):
>>
>> import java.util.ArrayList;
>>
>> public class LongParNewPause {
>>
>>     static byte[] bigStaticObject;
>>
>>     public static void main(String[] args) throws Exception {
>>         int bigObjSize   = args.length > 0 ? Integer.parseInt(args[0])
>> : 500 * 1024 * 1024;
>>         int littleObjSize = args.length > 1 ? Integer.parseInt(args[1])
>> : 100;
>>         int saveFraction  = args.length > 2 ? Integer.parseInt(args[2])
>> : 10;
>>
>>         bigStaticObject = new byte[bigObjSize];
>>
>>         ArrayList<byte[]> holder = new ArrayList<byte[]>();
>>
>>         int i = 0;
>>         while (true) {
>>             byte[] local = new byte[littleObjSize];
>>             if (i++ % saveFraction == 0) {
>>                 holder.add(local);
>>             }
>>         }
>>     }
>> }
>>
>> I run it with the following options:
>>
>>     -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx2G -Xms2G
>>
>> Note that I have not seen the issue with 1G heaps.  4G heaps also
>> exhibit the issue (as do heaps as small as 1.2G).  Bigger heaps seem to
>> aggrevate the problem.
>>
>> Here is some sample output:
>>
>> 0.321: [GC (Allocation Failure) 0.321: [ParNew:
>> 272640K->27329K(306688K), 0.0140537 secs] 784640K->539329K(2063104K),
>> 0.0141584 secs] [Times: user=0.05 sys=0.02, real=0.02 secs]
>> 0.368: [GC (Allocation Failure) 0.368: [ParNew:
>> 299969K->34048K(306688K), 0.7655383 secs] 811969K->572321K(2063104K),
>> 0.7656172 secs] [Times: user=2.89 sys=0.02, real=0.77 secs]
>> 1.165: [GC (Allocation Failure) 1.165: [ParNew:
>> 306688K->34048K(306688K), 13.8395969 secs] 844961K->599389K(2063104K),
>> 13.8396650 secs] [Times: user=54.38 sys=0.05, real=13.84 secs]
>> 15.036: [GC (Allocation Failure) 15.036: [ParNew:
>> 306688K->34048K(306688K), 0.0287254 secs] 872029K->628028K(2063104K),
>> 0.0287876 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
>> 15.096: [GC (Allocation Failure) 15.096: [ParNew:
>> 306688K->34048K(306688K), 0.0340727 secs] 900668K->657717K(2063104K),
>> 0.0341386 secs] [Times: user=0.09 sys=0.00, real=0.03 secs]
>> ...
>>
>> Even stranger is the fact that the problem seems to be limited to
>> objects in the range of about 480M to 512M.  Specifically:
>>
>>  [ 503316465 , 536870384 ]
>>
>> Values outside this range appear to be OK.  Playing with the other input
>> parameters (size of the small objects and the fraction that are
>> retained) can modify the system behavior:
>>
>> - Increase the length of the pauses - I have seen pauses of over a
>> minute my testing.
>> - Cause the pause times to steadily increase over many ParNewGC cycles.
>>
>> During the pauses the CPUs seem to peg 100% (no waiting on resources).
>>
>> Anyone have any thoughts?  I would be very interested to know if anyone
>> else can reproduce the issue?  Other engineers here have been able to.
>>
>> ________________________________
>>
>> This email and any attachments may contain confidential and privileged
>> material for the sole use of the intended recipient. Any review,
>> copying, or distribution of this email (or any attachments) by others
>> is prohibited. If you are not the intended recipient, please contact
>> the sender immediately and permanently delete this email and any
>> attachments. No employee or agent of TiVo Inc. is authorized to
>> conclude any binding agreement on behalf of TiVo Inc. by email.
>> Binding agreements with TiVo Inc. may only be made by a signed written
>> agreement.
>
> James Lent
> Senior Software Engineer
>
> *Digitalsmiths*
> A TiVo Company
>
> www.digitalsmiths.com <http://www.digitalsmiths.com/>
> jlent at digitalsmiths.com <mailto:jlent at digitalsmiths.com>  | office
> 919.460.4747
>
> ------------------------------------------------------------------------
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review,
> copying, or distribution of this email (or any attachments) by others is
> prohibited. If you are not the intended recipient, please contact the
> sender immediately and permanently delete this email and any
> attachments. No employee or agent of TiVo Inc. is authorized to conclude
> any binding agreement on behalf of TiVo Inc. by email. Binding
> agreements with TiVo Inc. may only be made by a signed written agreement.


More information about the hotspot-gc-dev mailing list