Young GC pause time definitions
chsu79 at gmail.com
Tue Jul 12 05:42:35 PDT 2011
Hi gc devs,
I'm investigating young gc times that increased 10 fold after our
software was upgraded on a customer system. I've been unable to
reproduce something similar with a local test machine so I am unable
to experiment. For the customer, it used to be ~20ms young GCs and now
it is ~200ms long stops.
I get the stop time by looking at the increment in the YGCT column of
"jstat -gc" output.
After upgrade, there is more objects that survive into the CMS space.
It used to be 300 bytes (first i see something that low) and now it is
up to 600kb, on each young gc. This would explain more work done
during young gc. But can cpu bound work really explain 180ms extra to
get 600kb moved into CMS space? Especially as I don't see CPU usage
being a big difference before vs after.
I'm on thin ice with this: failure to allocate an object causes a vm
safepoint, at which the vm runs a young gc, is that right? Could it
happen that it takes times to get all mutator threads to stop in this
safepoint before young gc can proceed?
If this is so, is there any way to learn how much time that is spent
getting threads to get into the safepoint, vs time to do gc work? And
what is measured by the young gc time from jstat or verbose gc?
The biggest change in our software is that we have reimplemented a big
part of our main execution to rely on blocking queues (from java util
concurrent) instead of an in-house queue implementation based on java
monitors. I've had some wild guesses that synchronization on blocking
queues are less aware of having to get into a safepoint, because they
are a less-overhead synchronization mechanism.
By scanning through this mailing list and other sites I have learned
about PrintSafepointStatistics, PrintGCApplicationStoppedTime &
PrintGCApplicationConcurrentTime. But I haven't found any detailed
explaination of what these values are. Potentially I could get the
customer to re-run tests with these, but as I don't know exactly what
the output means, I can't say for sure how I would use the information
gained from it.
The customer site is running an old jdk 1.6.0_14, with
-XX:+UseParNewGC and -XX:UseConcMarkSweepGC. Uses a 12 G heap, a
relatively small 512Mb new size.
If I'm completely wrong in above guesses about how things work I will
suggest to them to increase the NewSize as it seems our new version of
the software fills upp newspace too quickly and short-lived objects
are promoted into the CMS heap too soon. But if the problem would be
that mutator threads dont enter the safepoint as quickly as with our
old software, then their long young gc pauses would remain. I really
want them to upgrade to a more recent jdk as I know 1.6.0_18 fixed
some CMS fragmentation bugs.
More information about the hotspot-gc-dev