G1GC Full GCs
Y. S. Ramakrishna
y.s.ramakrishna at oracle.com
Tue Jul 6 16:57:32 PDT 2010
(1) do you enable class unloading with CMS? (I do not see that
below in yr option list, but wondered.)
(2) does your application load classes, or intern a lot of strings?
If i am reading the logs right, G1 appears to reclaim less
and less of the heap in each cycle until a full collection
intervenes, and I have no real explanation for this behaviour
except that perhaps there's something in the perm gen that
keeps stuff in the remainder of the heap artificially live.
G1 does not incrementally collect the young gen, so this is
plausible. But CMS does not either by default and I do not see
that option in the CMS options list you gave below. It would
be instructive to see what the comparable CMS logs look like.
May be then you could start with the same heap shapes for the
two and see if you can get to the bottom of the full gc (which
as i understand you get to more quickly w/G1 than you did
On 07/06/10 14:24, Todd Lipcon wrote:
> On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu <jon.masamitsu at oracle.com
> <mailto:jon.masamitsu at oracle.com>> wrote:
> Could you send a segment of the GC logs from the beginning
> through the first dozen or so full GC's?
> Sure, I just put it online at:
> Exactly which version of the JVM are you using?
> java -version
> will tell us.
> Latest as of last night:
> [todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version
> java version "1.7.0-ea"
> Java(TM) SE Runtime Environment (build 1.7.0-ea-b99)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode)
> Do you have a test setup where you could do some experiments?
> Sure, I have a five node cluster here where I do lots of testing, happy
> to try different builds/options/etc (though I probably don't have time
> to apply patches and rebuild the JDK myself)
> Can you send the set of CMS flags you use? It might tell
> us something about the GC behavior of you application.
> Might not tell us anything but it's worth a look.
> Different customers have found different flags to work well for them.
> One user uses the following:
> -Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \
> -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \
> Another uses:
> -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC
> -XX:NewSize=64m -XX:MaxNewSize=64m
> -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails
> The particular tuning options probably depend on the actual cache
> workload of the user. I tend to recommend CMSInitiatingOccupancyFraction
> around 75 or so, since the software maintains about 60% heap usage. I
> also think a NewSize slightly larger would improve things a bit, but if
> it gets more than 256m or so, the ParNew pauses start to be too long for
> a lot of use cases.
> Regarding CMS logs, I can probably restart this test later this
> afternoon on CMS and run it for a couple hours, but it isn't likely to
> hit the multi-minute compaction that quickly. It happens more in the wild.
> On 07/06/10 13:27, Todd Lipcon wrote:
>> Hi all,
>> I work on HBase, a distributed database written in Java. We
>> generally run on large heaps (8GB+), and our object lifetime
>> distribution has proven pretty problematic for garbage collection
>> (we manage a multi-GB LRU cache inside the process, so in CMS we
>> tenure a lot of byte arrays which later get collected).
>> In Java6, we generally run with the CMS collector, tuning down
>> CMSInitiatingOccupancyFraction and constraining MaxNewSize to
>> achieve fairly low pause GC, but after a week or two of uptime we
>> often run into full heap compaction which takes several minutes
>> and wreaks havoc on the system.
>> Needless to say, we've been watching the development of the G1 GC
>> with anticipation for the last year or two. Finally it seems in
>> the latest build of JDK7 it's stable enough for actual use
>> (previously it would segfault within an hour or so). However, in
>> my testing I'm seeing a fair amount of 8-10 second Full GC pauses.
>> The flags I'm passing are:
>> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20
>> Most of the pauses I see in the GC log are around 10-20ms as expected:
>> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial),
>> 0.01209000 secs]
>> [Parallel Time: 10.5 ms]
>> [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2
>> 1680079.9 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0
>> 1680080.1 1680080.0 1680079.9 1680081.5]
>> [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2
>> 1.9 2.5 1.7 1.7 0.1
>> Avg: 1.8, Min: 0.1, Max: 2.5]
>> [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3
>> Sum: 52, Avg: 4, Min: 1, Max: 8]
>> [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4
>> 0.5 0.4 0.5 0.3 0.3 0.0
>> Avg: 0.4, Min: 0.0, Max: 0.7]
>> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0
>> 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>> Avg: 0.0, Min: 0.0, Max: 0.0]
>> [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6
>> 0.0 0.8 0.8 0.9
>> Avg: 0.5, Min: 0.0, Max: 0.9]
>> [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2
>> 7.1 6.9 7.1 7.1 7.0
>> Avg: 7.1, Min: 6.9, Max: 7.3]
>> [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
>> 0.0 0.0 0.0 0.0 0.0
>> Avg: 0.0, Min: 0.0, Max: 0.0]
>> [Other: 0.6 ms]
>> [Clear CT: 0.5 ms]
>> [Other: 1.1 ms]
>> [Choose CSet: 0.0 ms]
>> [ 7677M->7636M(8000M)]
>> [Times: user=0.12 sys=0.00, real=0.01 secs]
>> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds:
>> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail
>> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC
>> 7934M->4865M(8000M), 9.8907800 secs]
>> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC
>> 7930M->4964M(8000M), 9.9025520 secs]
>> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC
>> 7934M->4882M(8000M), 10.1232190 secs]
>> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC
>> 7938M->5002M(8000M), 10.4997760 secs]
>> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC
>> 7938M->4962M(8000M), 11.0497380 secs]
>> These pauses are pretty unacceptable for soft real time operation.
>> Am I missing some tuning that should be done for G1GC for
>> applications like this? Is 20ms out of 80ms too aggressive a
>> target for the garbage rates we're generating?
>> My actual live heap usage should be very stable around 5GB - the
>> application very carefully accounts its live object set at around
>> 60% of the max heap (as you can see in the logs above).
>> At this point we are considering doing crazy things like ripping
>> out our main memory consumers into a custom slab allocator, and
>> manually reference count the byte array slices. But, if we can get
>> G1GC to work for us, it will save a lot of engineering on the
>> application side!
>> Todd Lipcon
>> Software Engineer, Cloudera
>> hotspot-gc-use mailing list
>> hotspot-gc-use at openjdk.java.net <mailto:hotspot-gc-use at openjdk.java.net>
> Todd Lipcon
> Software Engineer, Cloudera
> hotspot-gc-use mailing list
> hotspot-gc-use at openjdk.java.net
More information about the hotspot-gc-use