Bug in G1 logging? incorrect stop times?

Thomas Viessmann thomas.viessmann at oracle.com
Mon Oct 6 08:50:33 UTC 2014


Hi,

thanks to all of you. This really helps a lot.

regards

Thomas




On 10/03/14 11:03, Mikael Gerdin wrote:
> On Thursday 02 October 2014 16.34.54 Bengt Rutisson wrote:
>> On 2014-10-02 15:07, charlie hunt wrote:
>>> Hi Thomas,
>>>
>>> Don't mean to jump in the middle of your discussion with Bengt ...
>>>
>>> Since this is M5, have you tried setting the FX Solaris CPU scheduler,
>>> (based on the hypothesis this may be thread scheduling related)?
>> Good point, Charlie. My guess is that there is some kind of scheduling
>> issue. 2200 threads are quite a few threads to start and stop...
>>
>> Thomas, have you tried with ParallelGC to see if you happen to get the
>> same issue there. If it is a scheduling issue for safepoints I don't
>> think it should be GC dependent.
> Part of that information should be available in the logs since Thomas was
> running with -XX:+PrintSafepointStatistics which provides a breakdown of the
> timings of different phases of a safepoint operation.
> Unfortunately, the default behavior for the flag is to summarize a number of
> safepoint ops regularly, so you may need to look through a large part of the
> log to find the safepoint corresponding to the problematic GC cycle.
>
> The output should look something like:
> $ java -XX:+PrintSafepointStatistics
>           vmop                    [threads: total initially_running
> wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
> 0.117: ParallelGCFailedAllocation       [       8          0              0
> ]      [     0     0     0     0     1    ]  0
> 0.144: no vm operation                  [       6          0              0
> ]      [     0     0     0     0     0    ]  0
>
> /Mikael
>
>> Bengt
>>
>>> hths,
>>>
>>> Charlie
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>> On Oct 2, 2014, at 5:35 AM, Thomas Viessmann
>>>
>>> <thomas.viessmann at oracle.com <mailto:thomas.viessmann at oracle.com>> wrote:
>>>> Hi Bengt,
>>>>
>>>> Better late than never :-) .
>>>> There are 2200 threads, about 10% are runnable.
>>>> The application is the only one running on this M5.
>>>>
>>>> We observe this effect quite often with G1. Relatively short
>>>> GC times and long application thread stop times.
>>>> Interestingly I've never seen this with CMS. But
>>>> CMS is not an option here due to fragmentation.
>>>>
>>>> Thanks and Regards
>>>>
>>>> Thomas
>>>>
>>>> On 10/02/14 11:15, Bengt Rutisson wrote:
>>>>> Hi Thomas,
>>>>>
>>>>> A very late reply. I happened to find this in my inbox and have some
>>>>> comments.
>>>>>
>>>>> On 2014-08-29 14:15, Thomas Viessmann wrote:
>>>>>> Hi all,
>>>>>>
>>>>>>
>>>>>> I suspect a bug in G1 logging. Before I file a (possibly incorrect)
>>>>>> bug I would like to ask here.
>>>>>> Is my understanding correct that the stw evacuation phase starts at
>>>>>> *48448.931* and ends at *48454**.034* ?
>>>>>> This would mean a stop time of 5.103 s, which is clearly a mismatch
>>>>>> to the reported *9.3426141* s
>>>>>> If the stw phase in fact started at a later time e.g. at *48449.288
>>>>>> *then the stop pause would have been even shorter.
>>>>>>
>>>>>> In addition I do not understand how all the relatively short
>>>>>> evacuation phases sum up to several seconds.
>>>>>> If I add all the MAX values *69.1 + 22.2 + 45+ 150.8 + 0.1 + 155.1
>>>>>> + 0.9 + 0.5**= 443.7**ms*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 48443.380: Total time for which application threads were stopped:
>>>>>> 0.0084701 seconds
>>>>>> *48448.931*: [GC pause (mixed) 48448.931: [G1Ergonomics (CSet
>>>>>> Construction) start choosing CSet, _pending_cards: 89458, predicted
>>>>>> base time: 73.58 ms, remaining time: 176.42 ms, target pause time:
>>>>>> 250.00 ms]
>>>>>>
>>>>>>   48448.931: [G1Ergonomics (CSet Construction) add young regions to
>>>>>>
>>>>>> CSet, eden: 33 regions, survivors: 5 regions, predicted young
>>>>>> region time: 35.73 ms]
>>>>>>
>>>>>>   48448.933: [G1Ergonomics (CSet Construction) finish adding old
>>>>>>
>>>>>> regions to CSet, reason: reclaimable percentage not over threshold,
>>>>>> old: 19 regions, max: 77 regions, reclaimable: 2570649712 bytes
>>>>>> (9.98 %), threshold: 10.00 %]
>>>>>>
>>>>>>   48448.933: [G1Ergonomics (CSet Construction) added expensive
>>>>>>
>>>>>> regions to CSet, reason: old CSet region num not reached min, old:
>>>>>> 19 regions, expensive: 7 regions, min: 26 regions, remaining time:
>>>>>> 0.00 ms]
>>>>>>
>>>>>>   48448.933: [G1Ergonomics (CSet Construction) finish choosing CSet,
>>>>>>
>>>>>> eden: 33 regions, survivors: 5 regions, old: 19 regions, predicted
>>>>>> pause time: 340.58 ms, target pause time: 250.00 ms]
>>>>>> 48449.252: [SoftReference, 962 refs, 0.0064679 secs]48449.259:
>>>>>> [WeakReference, 7411 refs, 0.0027722 secs]48449.261:
>>>>>> [FinalReference, 272 refs, 0.0022754 secs]48449.264:
>>>>>> [PhantomReference, 6 refs, 0.0024348 secs]48449.266: [JNI Weak
>>>>>> Reference, 0.0000337 secs] 48449.288: [G1Ergonomics (Concurrent
>>>>>> Cycles) do not request concurrent cycle initiation, reason: still
>>>>>> doing mixed collections, occupancy: 14898167808 bytes, allocation
>>>>>> request: 0 bytes, threshold: 11596411665 bytes (45.00 %), source:
>>>>>> end of GC]
>>>>>> *48449.288*: [G1Ergonomics (Mixed GCs) do not continue mixed GCs,
>>>>>> reason: reclaimable percentage not over threshold, candidate old
>>>>>> regions: 156 regions, reclaimable: 2570649712 bytes (9.98 %),
>>>>>> threshold: 10.00 %]
>>>>>> , 0.3573636 secs]
>>>>>>
>>>>>>     [Parallel Time: 314.9 ms, GC Workers: 40]
>>>>>>     
>>>>>>        [GC Worker Start (ms): Min: 48448933.6, Avg: 48448934.2, Max:
>>>>>> 48448934.7, Diff: 1.1]
>>>>>>
>>>>>>        [Ext Root Scanning (ms): Min: 15.1, Avg: 17.7, Max: *69.1*,
>>>>>>
>>>>>> Diff: 54.0, Sum: 708.0]
>>>>>>
>>>>>>        [Update RS (ms): Min: 0.0, Avg: 14.5, Max: *22.2*, Diff:
>>>>>> 22.2, Sum: 580.3]
>>>>>>
>>>>>>           [Processed Buffers: Min: 0, Avg: 22.0, Max: *45*, Diff:
>>>>>> 45, Sum: 879]
>>>>>>
>>>>>>        [Scan RS (ms): Min: 88.7, Avg: 142.6, Max: *150.8*, Diff:
>>>>>> 62.1, Sum: 5705.2]
>>>>>>
>>>>>>        [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: *0.1*,
>>>>>>
>>>>>> Diff: 0.1, Sum: 0.1]
>>>>>>
>>>>>>        [Object Copy (ms): Min: 131.4, Avg: 138.2, Max: *155.1*,
>>>>>>
>>>>>> Diff: 23.8, Sum: 5526.6]
>>>>>>
>>>>>>        [Termination (ms): Min: 0.0, Avg: 0.7, Max: *0.9*, Diff: 0.9,
>>>>>>
>>>>>> Sum: 27.2]
>>>>>>
>>>>>>        [GC Worker Other (ms): Min: 0.0, Avg: 0.3, Max: *0.5*, Diff:
>>>>>> 0.5, Sum: 10.0]
>>>>>>
>>>>>>        [GC Worker Total (ms): Min: 313.2, Avg: 313.9, Max: *314.7*,
>>>>>>
>>>>>> Diff: 1.5, Sum: 12557.5]
>>>>>>
>>>>>>        [GC Worker End (ms): Min: 48449247.9, Avg: 48449248.1, Max:
>>>>>> 48449248.4, Diff: 0.5]
>>>>>>
>>>>>>     [Code Root Fixup: 2.2 ms]
>>>>>>     [Code Root Migration: 0.2 ms]
>>>>>>     [Clear CT: 2.0 ms]
>>>>>>     [Other: 38.2 ms]
>>>>>>     
>>>>>>        [Choose CSet: 1.7 ms]
>>>>>>        [Ref Proc: 15.6 ms]
>>>>>>        [Ref Enq: 1.2 ms]
>>>>>>        [Free CSet: 2.4 ms]
>>>>>>     
>>>>>>     [Eden: 1056.0M(1056.0M)->0.0B(1088.0M) Survivors: 160.0M->128.0M
>>>>>>
>>>>>> Heap: 14.5G(24.0G)->13.9G(24.0G)]
>>>>>>
>>>>>>   [Times: user=12.45 sys=0.03, real=0.36 secs]
>>>>>>
>>>>>> *48454**.034*: Total time for which application threads were
>>>>>> stopped: *9.3426141* seconds
>>>>>>
>>>>>> For completeness; here are the cmd line options:
>>>>>>
>>>>>>
>>>>>> /usr/jdk/instances/jdk1.7.0/bin/sparcv9/java -Xms24g -Xmx24g
>>>>>> -server  -XX:StringTableSize=27001 -XX:PermSize=512m
>>>>>> -XX:MaxPermSize=512m  -Xss512k
>>>>>> -XX:+UseG1GC  -XX:SoftRefLRUPolicyMSPerMB=500 -XX:+PrintGCDetails
>>>>>> -XX:+PrintGCTimeStamps
>>>>>> -Xloggc:/var/cacao/instances/oem-ec/logs/gc.log
>>>>>> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
>>>>>> -XX:GCLogFileSize=10m  -XX:+PrintGCApplicationStoppedTime
>>>>>> -XX:MaxGCPauseMillis=250-XX:InitiatingHeapOccupancyPercent=45
>>>>>> -XX:+ParallelRefProcEnabled  -XX:ParallelGCThreads=40
>>>>>> -XX:+PrintAdaptiveSizePolicy -XX:+PrintSafepointStatistics
>>>>>> -XX:+PrintReferenceGC -XX:G1HeapRegionSize=32m
>>>>>> -XX:HeapBaseMinAddress=2G
>>>>> The GC starts at 48448.931 and reports that it lasts for 0.36
>>>>> seconds, so it should have ended at 48449.291. The last log line
>>>>> that states that the threads were stopped for 9 seconds is from the
>>>>> PrintGCApplicationStoppedTime flag, which logs how long the
>>>>> safepoint was. i.e. how long the Java threads where stopped.
>>>>>
>>>>> This means that the safepoint lasted for 9 seconds but the GC only
>>>>> lasted for 0.36 seconds. So the rest of the time must be attributed
>>>>> to stopping and starting the Java threads. The safepoint was started
>>>>> at 48444.691 (48454.034 - 9.3426141), which means that it took 4.240
>>>>> seconds (48448.931 - 48444.691) to stop all Java threads and 4.743
>>>>> seconds (9.3426141 - 4.240 - 0.36) to start them again.
>>>>>
>>>>> I don't have a good explanation for why it takes so long to stop and
>>>>> start the threads. Are there many Java threads in the application?
>>>>> Are there many other processes running that could cause the Java
>>>>> threads to be scheduled out?
>>>>>
>>>>> Thanks,
>>>>> Bengt
>>>>>
>>>>>> Thanks and Regards
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>>
>>>>>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>>>>>
>>>>>> ORACLE Deutschland B.V. & Co. KG
>>>>>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>>>>>> Registergericht: Amtsgericht Muenchen, HRA 95603
>>>>>> Geschäftsführere: Juergen Kunz
>>>>>>
>>>>>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>>>>>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>>>>>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>>>>>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>>>>>
>>>>>> -----------------------------------------------------------------------
>>>>>> -
>>>>>> -----------------------------------------------------------------------
>>>>>> -
>>>>>> <mime-attachment.gif> <http://www.oracle.com/commitment> Oracle is
>>>>>> committed to developing practices and products that help protect
>>>>>> the environment
>>>> ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen
>>>>
>>>> ORACLE Deutschland B.V. & Co. KG
>>>> Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
>>>> Registergericht: Amtsgericht Muenchen, HRA 95603
>>>> Geschäftsführere: Juergen Kunz
>>>>
>>>> Komplementärin: ORACLE Deutschland Verwaltung B.V.
>>>> Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
>>>> Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
>>>> Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher
>>>>
>>>> ------------------------------------------------------------------------
>>>> ------------------------------------------------------------------------
>>>> <green-for-email-sig_0.gif> <http://www.oracle.com/commitment> Oracle
>>>> is committed to developing practices and products that help protect
>>>> the environment

-- 
Oracle <http://www.oracle.com>
THOMAS VIESSMANN | Senior Principal Technical Support Engineer - Java
Phone: +498914302496 <tel:+49814302496> | Mobile: +491743005467 
<tel:+491743005467>
Oracle Customer Technical Support - Java

ORACLE Deutschland B.V. & Co. KG | Riesstr.25 | D-80992 Muenchen

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstr. 25, D-80992 Muenchen
Registergericht: Amtsgericht Muenchen, HRA 95603
Geschäftsführere: Juergen Kunz

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Astrid Kepper, Val Maher

------------------------------------------------------------------------
------------------------------------------------------------------------
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to 
developing practices and products that help protect the environment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20141006/3d99cf00/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: oracle_sig_logo.gif
Type: image/gif
Size: 658 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20141006/3d99cf00/oracle_sig_logo.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: green-for-email-sig_0.gif
Type: image/gif
Size: 356 bytes
Desc: not available
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20141006/3d99cf00/green-for-email-sig_0.gif>


More information about the hotspot-gc-dev mailing list