Strange Long ParNew GC Pauses (Sample Code Included)

James Lent jlent at tivo.com
Fri May 8 13:40:50 UTC 2015


The explanation of the problem added by Jesper Wilhelmsson on 2015-05-06 08:59 to JDK-8079274 is interesting, but, not very satisfying.

1) The problem affects more than this one specific breakpoint.  It impacts objects in at least the range of about 480M to 512M.  The comment "A larger object probably triggers some higher level code that are able to skip scanning of this part of the memory" does not inspire confidence. It makes even more concerned that we do not understand this issue and that there could be other triggers.  It seems like "the higher level code" has a 32M blind spot.  Further I can create scenarios where the GC code takes progressively longer to clear the issue.  Presumably it completes the lower block scan during each GC cycle.  Why does it take longer and longer each time (my guess is it has something to do with the number of objects moved to survivor space).  If I cache 1 in 100 instead of 1 in 10 then there are 11 progressively slower ParNewGC (after several fast one) and the issue takes minutes to clear:

$ /usr/lib/jvm/java-7-oracle/bin/java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870384 100 100
0.308: [GC0.308: [ParNew: 272640K->3029K(306688K), 0.0113180 secs] 796927K->527316K(4160256K), 0.0114040 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
0.367: [GC0.367: [ParNew: 275669K->7766K(306688K), 0.0140840 secs] 799956K->532053K(4160256K), 0.0141630 secs] [Times: user=0.04 sys=0.01, real=0.02 secs]
0.430: [GC0.430: [ParNew: 280406K->10057K(306688K), 0.0120810 secs] 804693K->534344K(4160256K), 0.0121690 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
0.490: [GC0.490: [ParNew: 282697K->15383K(306688K), 0.0128540 secs] 806984K->539670K(4160256K), 0.0129390 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
0.551: [GC0.551: [ParNew: 288023K->21006K(306688K), 0.0146480 secs] 812310K->545294K(4160256K), 0.0147650 secs] [Times: user=0.05 sys=0.01, real=0.01 secs]
0.613: [GC0.613: [ParNew: 293646K->26805K(306688K), 0.0141250 secs] 817934K->551092K(4160256K), 0.0142050 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
0.675: [GC0.675: [ParNew: 299445K->24258K(306688K), 0.4816840 secs] 823732K->551436K(4160256K), 0.4817620 secs] [Times: user=1.90 sys=0.00, real=0.48 secs]
1.205: [GC1.205: [ParNew: 296898K->17914K(306688K), 3.4216550 secs] 824076K->547788K(4160256K), 3.4217180 secs] [Times: user=13.50 sys=0.00, real=3.42 secs]
4.674: [GC4.674: [ParNew: 290554K->17433K(306688K), 6.3406820 secs] 820428K->550031K(4160256K), 6.3407410 secs] [Times: user=25.00 sys=0.02, real=6.34 secs]
11.062: [GC11.062: [ParNew: 290073K->17315K(306688K), 10.9493130 secs] 822671K->552636K(4160256K), 10.9493890 secs] [Times: user=42.98 sys=0.03, real=10.95 secs]
22.059: [GC22.059: [ParNew: 289955K->17758K(306688K), 16.0539650 secs] 825276K->555798K(4160256K), 16.0540250 secs] [Times: user=62.44 sys=0.09, real=16.05 secs]
38.161: [GC38.161: [ParNew: 290398K->17745K(306688K), 20.6900820 secs] 828438K->558512K(4160256K), 20.6901420 secs] [Times: user=80.79 sys=0.08, real=20.69 secs]
58.898: [GC58.898: [ParNew: 290385K->21826K(306688K), 24.6781700 secs] 831152K->565310K(4160256K), 24.6782710 secs] [Times: user=95.58 sys=0.05, real=24.68 secs]
83.625: [GC83.625: [ParNew: 294466K->21834K(306688K), 30.3501800 secs] 837950K->568044K(4160256K), 30.3502610 secs] [Times: user=115.80 sys=0.11, real=30.35 secs]
114.022: [GC114.022: [ParNew: 294474K->21836K(306688K), 33.6648800 secs] 840684K->570772K(4160256K), 33.6649390 secs] [Times: user=127.38 sys=0.14, real=33.67 secs]
147.736: [GC147.736: [ParNew: 294476K->18454K(306688K), 38.9828430 secs] 843412K->570117K(4160256K), 38.9829040 secs] [Times: user=149.14 sys=0.15, real=38.99 secs]
186.766: [GC186.766: [ParNew: 291094K->26602K(306688K), 39.8726730 secs] 842757K->580977K(4160256K), 39.8727330 secs] [Times: user=156.35 sys=0.11, real=39.88 secs]
226.686: [GC226.686: [ParNew: 299242K->26612K(306688K), 0.0174470 secs] 853617K->583714K(4160256K), 0.0175300 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]

2) The suggestion to use only 1 thread is not acceptable to me.  Use parallel GC with one thread?  Further it did not work for me (in fact it made things worse).  "user" time does now equal "real" time (and the new size is 1/4 as big) so I think the parameter took effect:

$ /usr/lib/jvm/java-8-oracle/bin/java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G -XX:ParallelGCThreads=1 LongParNewPause 536870384 100 10
0.226: [GC (Allocation Failure) 0.226: [ParNew: 68160K->6980K(76672K), 0.0489756 secs] 592447K->531267K(4185792K), 0.0490718 secs] [Times: user=0.05 sys=0.00, real=0.05 secs]
0.291: [GC (Allocation Failure) 0.291: [ParNew: 75140K->8512K(76672K), 29.8707281 secs] 599427K->539479K(4185792K), 29.8707954 secs] [Times: user=29.80 sys=0.00, real=29.88 secs]
30.180: [GC (Allocation Failure) 30.180: [ParNew: 76672K->8093K(76672K), 30.6357699 secs] 607639K->545772K(4185792K), 30.6358257 secs] [Times: user=30.57 sys=0.00, real=30.63 secs]
60.837: [GC (Allocation Failure) 60.837: [ParNew: 76253K->6830K(76672K), 69.2115255 secs] 613932K->552169K(4185792K), 69.2115783 secs] [Times: user=69.05 sys=0.01, real=69.22 secs]
130.068: [GC (Allocation Failure) 130.068: [ParNew: 74990K->8226K(76672K), 132.5118903 secs] 620329K->560381K(4185792K), 132.5119459 secs] [Times: user=132.21 sys=0.00, real=132.52 secs]
262.603: [GC (Allocation Failure) 262.603: [ParNew: 76386K->8512K(76672K), 0.0400138 secs] 628541K->568748K(4185792K), 0.0400687 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]

3) Applications should not be expected to tune the GC settings to account for the size of one static object.  The connection between the size of this object and the GC performance was not easy to even find.  Further the parameter ParGCCardsPerStrideChunk is pretty low level and not one I think many application developers are familiar with.

Error: VM option 'ParGCCardsPerStrideChunk' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

4) I can not reproduce this issue in Oracle 6.  This appears to be a regression.  How can fixing a regression be called an enhancement request?







________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20150508/baf75c7d/attachment-0001.html>


More information about the hotspot-gc-dev mailing list