G1GC Full GCs
todd at cloudera.com
Tue Jul 6 14:24:27 PDT 2010
On Tue, Jul 6, 2010 at 2:09 PM, Jon Masamitsu <jon.masamitsu at oracle.com>wrote:
> Could you send a segment of the GC logs from the beginning
> through the first dozen or so full GC's?
Sure, I just put it online at:
> Exactly which version of the JVM are you using?
> java -version
> will tell us.
Latest as of last night:
[todd at monster01 ~]$ ./jdk1.7.0/jre/bin/java -version
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b99)
Java HotSpot(TM) 64-Bit Server VM (build 19.0-b03, mixed mode)
> Do you have a test setup where you could do some experiments?
Sure, I have a five node cluster here where I do lots of testing, happy to
try different builds/options/etc (though I probably don't have time to apply
patches and rebuild the JDK myself)
> Can you send the set of CMS flags you use? It might tell
> us something about the GC behavior of you application.
> Might not tell us anything but it's worth a look.
Different customers have found different flags to work well for them. One
user uses the following:
-Xmx12000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC \
-XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=40 \
-XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails
The particular tuning options probably depend on the actual cache workload
of the user. I tend to recommend CMSInitiatingOccupancyFraction around 75 or
so, since the software maintains about 60% heap usage. I also think a
NewSize slightly larger would improve things a bit, but if it gets more than
256m or so, the ParNew pauses start to be too long for a lot of use cases.
Regarding CMS logs, I can probably restart this test later this afternoon on
CMS and run it for a couple hours, but it isn't likely to hit the
multi-minute compaction that quickly. It happens more in the wild.
> On 07/06/10 13:27, Todd Lipcon wrote:
> Hi all,
> I work on HBase, a distributed database written in Java. We generally run
> on large heaps (8GB+), and our object lifetime distribution has proven
> pretty problematic for garbage collection (we manage a multi-GB LRU cache
> inside the process, so in CMS we tenure a lot of byte arrays which later get
> In Java6, we generally run with the CMS collector, tuning down
> CMSInitiatingOccupancyFraction and constraining MaxNewSize to achieve fairly
> low pause GC, but after a week or two of uptime we often run into full heap
> compaction which takes several minutes and wreaks havoc on the system.
> Needless to say, we've been watching the development of the G1 GC with
> anticipation for the last year or two. Finally it seems in the latest build
> of JDK7 it's stable enough for actual use (previously it would segfault
> within an hour or so). However, in my testing I'm seeing a fair amount of
> 8-10 second Full GC pauses.
> The flags I'm passing are:
> -Xmx8000m -XX:+UseG1GC -XX:MaxGCPauseMillis=20
> Most of the pauses I see in the GC log are around 10-20ms as expected:
> 2010-07-05T22:43:19.849-0700: 1680.079: [GC pause (partial), 0.01209000
> [Parallel Time: 10.5 ms]
> [Update RS (Start) (ms): 1680080.2 1680080.1 1680080.2 1680079.9
> 1680080.0 1680080.2 1680080.1 1680080.1 1680080.0 1680080.1 1680080.0
> 1680079.9 1680081.5]
> [Update RS (ms): 1.4 2.0 2.2 1.8 1.7 1.4 2.5 2.2 1.9 2.5
> 1.7 1.7 0.1
> Avg: 1.8, Min: 0.1, Max: 2.5]
> [Processed Buffers : 8 1 3 1 1 7 3 2 6 2 7 8 3
> Sum: 52, Avg: 4, Min: 1, Max: 8]
> [Ext Root Scanning (ms): 0.7 0.5 0.5 0.3 0.5 0.7 0.4 0.5 0.4
> 0.5 0.3 0.3 0.0
> Avg: 0.4, Min: 0.0, Max: 0.7]
> [Mark Stack Scanning (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 0.0 0.0 0.0 0.0
> Avg: 0.0, Min: 0.0, Max: 0.0]
> [Scan RS (ms): 0.9 0.4 0.1 0.7 0.7 0.9 0.0 0.1 0.6 0.0 0.8
> 0.8 0.9
> Avg: 0.5, Min: 0.0, Max: 0.9]
> [Object Copy (ms): 7.2 7.2 7.3 7.3 7.1 7.1 7.0 7.2 7.1 6.9
> 7.1 7.1 7.0
> Avg: 7.1, Min: 6.9, Max: 7.3]
> [Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
> 0.0 0.0 0.0
> Avg: 0.0, Min: 0.0, Max: 0.0]
> [Other: 0.6 ms]
> [Clear CT: 0.5 ms]
> [Other: 1.1 ms]
> [Choose CSet: 0.0 ms]
> [ 7677M->7636M(8000M)]
> [Times: user=0.12 sys=0.00, real=0.01 secs]
> But every 5-10 minutes I see a GC pause that lasts 10-15 seconds:
> [todd at monster01 logs]$ grep 'Full GC' gc-hbase.log | tail
> 2010-07-06T12:50:41.216-0700: 52521.446: [Full GC 7934M->4865M(8000M),
> 9.8907800 secs]
> 2010-07-06T12:55:39.802-0700: 52820.032: [Full GC 7930M->4964M(8000M),
> 9.9025520 secs]
> 2010-07-06T13:02:26.872-0700: 53227.102: [Full GC 7934M->4882M(8000M),
> 10.1232190 secs]
> 2010-07-06T13:09:41.049-0700: 53661.279: [Full GC 7938M->5002M(8000M),
> 10.4997760 secs]
> 2010-07-06T13:18:51.531-0700: 54211.761: [Full GC 7938M->4962M(8000M),
> 11.0497380 secs]
> These pauses are pretty unacceptable for soft real time operation.
> Am I missing some tuning that should be done for G1GC for applications
> like this? Is 20ms out of 80ms too aggressive a target for the garbage rates
> we're generating?
> My actual live heap usage should be very stable around 5GB - the
> application very carefully accounts its live object set at around 60% of the
> max heap (as you can see in the logs above).
> At this point we are considering doing crazy things like ripping out our
> main memory consumers into a custom slab allocator, and manually reference
> count the byte array slices. But, if we can get G1GC to work for us, it will
> save a lot of engineering on the application side!
> Todd Lipcon
> Software Engineer, Cloudera
> hotspot-gc-use mailing listhotspot-gc-use at openjdk.java.nethttp://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
Software Engineer, Cloudera
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-gc-use