unexplained CMS pauses
shane.cox at gmail.com
Mon Sep 28 11:06:19 PDT 2009
We're running Solaris 10 on 8-core Intel Xeon's. 1 Java instance per box.
JRE 1.6, update 14. 64-bit JVM.
Our app is making JNI calls. Each call requires approximately the same
amount of work (so we expect them to perform similarly). Internally, we
measure how long it takes to perform these calls. +99% of these calls
complete in less than 100 micros. However, we have outliers in the
300-1200ms range. After some research, we have found that these extreme
outliers coincide with the Concurrent Mark phase of CMS (based on
timestamps), and ONLY if there is a minor GC during that phase.
In a given day, our app will execute CMS collections 500-1000 times. Of
these, less than 10 will have a minor collection execute during the
Concurrent Mark. For each of these, our app reports a large pause in the
300-1200ms range. In fact, all of the large pauses reported by our app
correlate with a minor GC during a Concurrent Mark, without exception.
On Mon, Sep 28, 2009 at 1:13 PM, <Y.S.Ramakrishna at sun.com> wrote:
> What platform are you on (#cpu's etc.)
> and when you say "app reports a pause of 300 ms"
> is it that the odd transaction sees a latency
> of 300 ms (coincident with concurrent mark),
> whereas most transactions complete much more
> I am trying to first understand how you determine
> that the application is seeing "long pauses"
> when a minor gc occurs during concurrent mark.
> PS: for example, if two gc pauses (say a scavenge of 30 ms
> and an initial mark of 13 ms occur in quick succession,
> your application might notice a pause of 43 ms, etc.)
> -- ramki
> On 09/28/09 10:07, Shane Cox wrote:
>> Our application is reporting long pauses when a minor GC occurs during the
>> Concurrent Mark phase of CMS. The output below is a specific example. All
>> of the GC pauses are less than 30ms (initial mark, remark, minor GC).
>> However, our app reported a 300ms pause.
>> 56750.934: [GC [1 CMS-initial-mark: 702464K(1402496K)] 719045K(1551616K),
>> 0.0131859 secs]
>> 56750.947: [CMS-concurrent-mark-start]
>> 56752.133: [GC 56752.133: [ParNew: 144393K->12122K(149120K), 0.0237615
>> secs] 846857K->719330K(1551616K), 0.0239988 secs]
>> 56752.162: [CMS-concurrent-mark: 1.188/1.215 secs]
>> 56752.162: [CMS-concurrent-preclean-start]
>> 56752.243: [CMS-concurrent-preclean: 0.070/0.081 secs]
>> 56752.243: [CMS-concurrent-abortable-preclean-start]
>> 56752.765: [CMS-concurrent-abortable-preclean: 0.143/0.522 secs]
>> 56752.766: [GC[YG occupancy: 77423 K (149120 K)]56752.766: [Rescan
>> (parallel) , 0.0065730 secs]56752.773: [weak refs processing, 0.0001983
>> secs] [1 CMS-remark: 707208K(1402496K)] 784631K(1551616K), 0.0068908 secs]
>> 56752.773: [CMS-concurrent-sweep-start]
>> 56753.209: [CMS-concurrent-sweep: 0.436/0.436 secs]
>> 56753.209: [CMS-concurrent-reset-start]
>> 56753.219: [CMS-concurrent-reset: 0.010/0.010 secs]
>> We only observe this behavior when a minor GC occurs during the Concurrent
>> Mark (which is rare). Our app has reported pauses up to 1.2 seconds ...
>> which is generally the time it takes to perform a Concurrent Mark.
>> Any insight/help that you could provide would be much appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-gc-dev