g1 not doing partial aggressively enough -> fallback to full gc
peter.schuller at infidyne.com
Thu Jun 10 22:51:18 UTC 2010
> regions. Based on on the heap size after the full GC, the live working
> set if roughly ~ 230 MB, and the maximum heap size is 4 GB. This means
> that during the GC:s right before the full GC:s, g1 is choosing to do
> young generation GC:s even though the average live ratio in the older
> regions should be roughly 5% (low-hanging fruit for partial
I went back thinking that perhaps the liveness set was being
calculated with a granularity of card size. Because the test by design
ages data psuedo-randomly but uniformly (rather than in "chunks" of
related data allocated close to each-other), I thought that card
granularity may explain the behavior by causing G1 to severely
over-estimate the live data in old regions.
Trying to understand by source inspection, I'm still not sure but I
*think* this is not what's happening but I have definitely not wrapped
my head around the heap region/bit masks/marking stuff.
In any case, I re-ran the test with G1PrintRegionLiveness=5000 (and
with the patch I sent to -dev) to try to see whether the problem was
the estimated live set or not. I ran the test with a 3 GB maximum heap
size instead of 4 GB for practical reasons (lack of memory).
The full GC log is here:
But highlights are as follows. Firstly, the last marking phase (that
also completed + its count phase) prior to the full gc starts here:
89.252: [GC pause (young) (initial-mark)89.264: [GC concurrent-mark-start]
2137M->2131M(3072M), 0.0118340 secs]
89.281: [GC pause (young) 2142M->2133M(3072M), 0.0186420 secs]
So there we have a ~ 2130 MB heap.
The last count phase reports:
Overall (among printed): 3003514880 used, 1049827840 max live, 34.95%.
And the final full GC:
122.897: [Full GC 3071M->281M(937M), 3.3331940 secs]
Note that due to the nature of this test, the test has reached steady
state in terms of the actual live set a significant time before the
last initial mark. This means that the live set at the full GC is
roughly the same as the live set at the initial mark (the actual live
set, not the measured).
Firstly, I'm not sure I get the numbers. The total used translates to
2864 MB; almost as high as the maximum heap size, but decidedly higher
than the supposed heap space used. I'm not sure where the discrepancy
is coming from.
In any case, the mark+count results in these partials:
111.054: [GC pause (partial) 2896M->2850M(3072M), 0.0464190 secs]
111.144: [GC pause (partial) 2862M->2811M(3072M), 0.0458500 secs]
111.215: [GC pause (partial) 2822M->2773M(3072M), 0.0529640 secs]
111.294: [GC pause (partial) 2784M->2738M(3072M), 0.0502390 secs]
111.374: [GC pause (partial) 2749M->2707M(3072M), 0.0487850 secs]
111.459: [GC pause (partial) 2718M->2677M(3072M), 0.0478110 secs]
111.544: [GC pause (partial) 2687M->2650M(3072M), 0.0495010 secs]
111.640: [GC pause (partial) 2660M->2626M(3072M), 0.0498750 secs]
111.745: [GC pause (partial) 2635M->2601M(3072M), 0.0500490 secs]
111.857: [GC pause (partial) 2609M->2581M(3072M), 0.0471700 secs]
111.943: [GC pause (partial) 2588M->2577M(3072M), 0.0509000 secs]
Those numbers will of course be affected by both the young and the old
generation, but if I'm reading and interpreting it right, the maximum
amount of old-gen data that was collected was roughly 319 MB, assuming
no young-gen data was generated (not true, but it's an upper bound).
Given that the heap size was reported to be 2130 MB at the start of
the initial mark, that leaves >= 2130-319=1811 MB of heap "used", with
an actual live set of ~ 281 MB (based on full GC).
So, this still leaves me with an actual live set that is <= 15.5%
(281/1811) after partial GC:s stopped.
(1) The 15.5%, upper bound is not consistent with the 34.95% figure.
But then those overall stats were not consistent with the reported
heap size anyway. Either I blew it again with the patch (wouldn't be
surprised), or there is a discrepancy between the used/max_live
produced by the g1 mark/count and the heap stats printed. If the
latter is the case, is this understood?
(2) With an <= 15.5% average utilization you'd definitely want, in an
ideal world, a lot more partial GC:ing going on.
(3) If I ignore the region usage count which is suspect, the
observations still seem consistent with the live set being calculated
at card size granularity. Can anyone confirm/deny?
/ Peter Schuller
hotspot-gc-use mailing list
hotspot-gc-use at openjdk.java.net
More information about the hotspot-gc-dev