unexplained CMS pauses

Paul Hohensee Paul.Hohensee at Sun.COM
Mon Sep 28 16:24:30 PDT 2009


Might have nothing to do with concurrent mark running, rather with minor 
collections
happening during a java->native call.

Minor collections are stop-the-world, and if one occurs during a 
java->native call, the
java thread making the call to native will block on return from the call 
to native until
the minor collection is over.  If the outliers are outliers in the time 
it takes to execute
the native call, and the minor gc duration and timing are exactly right, 
you could end
up spending most of the minor gc time blocked.  Pause time would then be 
100ms
+ minor gc time.

Paul

Shane Cox wrote:
> Ramki,
> We're running Solaris 10 on 8-core Intel Xeon's.  1 Java instance per 
> box.  JRE 1.6, update 14.  64-bit JVM.
>
> Our app is making JNI calls.  Each call requires approximately the 
> same amount of work (so we expect them to perform similarly).  
> Internally, we measure how long it takes to perform these calls. +99% 
> of these calls complete in less than 100 micros.  However, we have 
> outliers in the 300-1200ms range.  After some research, we have found 
> that these extreme outliers coincide with the Concurrent Mark phase of 
> CMS (based on timestamps), and ONLY if there is a minor GC during that 
> phase.
>
> In a given day, our app will execute CMS collections 500-1000 times.  
> Of these, less than 10 will have a minor collection execute during the 
> Concurrent Mark.  For each of these, our app reports a large pause in 
> the 300-1200ms range.  In fact, all of the large pauses reported by 
> our app correlate with a  minor GC during a Concurrent Mark, without 
> exception.
>
> Thanks
>
> On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com 
> <mailto:Y.S.Ramakrishna at sun.com>> wrote:
>
>     What platform are you on (#cpu's etc.)
>     and when you say "app reports a pause of 300 ms"
>     is it that the odd transaction sees a latency
>     of 300 ms (coincident with concurrent mark),
>     whereas most transactions complete much more
>     quickly?
>
>     I am trying to first understand how you determine
>     that the application is seeing "long pauses"
>     when a minor gc occurs during concurrent mark.
>
>     PS: for example, if two gc pauses (say a scavenge of 30 ms
>     and an initial mark of 13 ms occur in quick succession,
>     your application might notice a  pause of 43 ms, etc.)
>
>     -- ramki
>
>
>     On 09/28/09 10:07, Shane Cox wrote:
>
>         Our application is reporting long pauses when a minor GC
>         occurs during the Concurrent Mark phase of CMS.  The output
>         below is a specific example.  All of the GC pauses are less
>         than 30ms (initial mark, remark, minor GC).  However, our app
>         reported a 300ms pause.
>
>         56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)]
>         719045K(1551616K), 0.0131859 secs]
>         56750.947: [CMS-concurrent-mark-start]
>         56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K),
>         0.0237615 secs] 846857K->719330K(1551616K), 0.0239988 secs]
>         56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>         56752.162: [CMS-concurrent-preclean-start]
>         56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>         56752.243: [CMS-concurrent-abortable-preclean-start]
>         56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522 secs]
>         56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766:
>         [Rescan (parallel) , 0.0065730 secs]56752.773: [weak refs
>         processing, 0.0001983 secs] [1 CMS-remark: 707208K(1402496K)]
>         784631K(1551616K), 0.0068908 secs]
>         56752.773: [CMS-concurrent-sweep-start]
>         56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>         56753.209: [CMS-concurrent-reset-start]
>         56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>
>
>         We only observe this behavior when a minor GC occurs during
>         the Concurrent Mark (which is rare).  Our app has reported
>         pauses up to 1.2 seconds ... which is generally the time it
>         takes to perform a Concurrent Mark.
>
>
>         Any insight/help that you could provide would be much appreciated.
>
>         Thanks
>
>
>


More information about the hotspot-gc-dev mailing list