CRR (M): 7132029: G1: mixed GC phase lasts for longer than it should

Tony Printezis tony.printezis at
Wed Feb 15 13:47:56 UTC 2012


Thanks for looking at this at such short notice! Inline.

On 02/15/2012 03:46 AM, Bengt Rutisson wrote:
> Hi Tony,
> I really like this cleanup! Thanks for fixing it.
> Some comments:
> collectionSetChooser.hpp:
> _numMarkedRegions is kind of a confusing name to me. It does not 
> contain all the marked regions. 

Well, I left it as it was. And it's also not called 
_numAllMarkedRegions. :-) How about I just call it _length (and 
_curMarkedIndex -> _curr_index)?

> It contains all the old regions that we would like to include in the 
> collection set. How about _num_selected_regions or even 
> _num_selected_old_regions?
> I think the remove(HeapRegion* hr) method is kind of strange. I 
> understand the need for peek() from finalize_cset(), but I think it is 
> strange that the remove() dependes on the code first calling peek(). 
> I'd prefer if we could make this more iterator style or at least 
> change remove() to someting like pop() that will return the next 
> HeapRegion instead of taking it as a parameter.

Well, peek() and remove() are only called from the loop in 
finalize_cset(), so they are tailored for that. I originally had 
remove() with no parameters that just removed the current region. I then 
thought of passing the region as a parameter mainly to do sanity testing 
(i.e., the region that it's being removed is the one we think it should 
be removed). And I then realized that, given we already have the region, 
there's no point in actually re-reading it from the array. Si we . I 
could generalize remove() to accept hr optionally, but given that we 
only use it when we know what the region to be removed is, I don't see 
the point. Also, I'm not sure that using HeapRegion* pop() is 
appropriate either, given that the caller doesn't need to be given the 
region that just got removed (it already knows which one it is).

Also note that the peek() / remove() pattern is similar to the 
ListIterator class in Java which has a E next() (equivalent to peek()) 
and void remove(). I just added extra information to the remove() 
method. (What I'm trying to say is: there's precedence to such a pattern.)

> collectionSetChooser.cpp:
> Why is the method CollectionSetChooser::updateAfterFullCollection() 
> needed? Can't G1CollectorPolicy::record_full_collection_end() do 
> _collectionSetChooser->clearMarkedHeapRegions() just like 
> G1CollectorPolicy::record_concurrent_mark_cleanup_end() does?

I'm OK either way. Maybe we used to do more in 
updateAfterFullCollection() once upon the time but we don't do any more. 
I'll remove it and replace it with a direct call to 
clearMarkedHeapRegions() as you suggested.

> g1_globals.hpp:
> It seems to me that at least the G1MaxMixedGCNum and the 
> G1OldCSetRegionThresholdPercent can be things that end users might 
> need to tweak until we have the heuristics more thoroughly tested out. 
> Would it be ok to make them available in product builds? Maybe as 
> experiemental flags?

Actually, it might be helpful to expose all four develop flags I added 
to the user so that they can control different aspects of the heuristics:

a) G1OldCSetRegionLiveThresholdPercent and 
G1OldReclaimableThresholdPercent: control how much free space to 
"sacrifice" in order to decrease mixed GC times (BTW, it'd be nice if we 
could somehow combine these two.)

b) G1MaxMixedGCNum: control how promptly G1 will collect the candidate 
old regions: more promptly -> more free space is made available quicker, 
less promptly -> each mixed GC might be shorter.

c) G1OldCSetRegionThresholdPercent: control how aggressive G1 will be in 
collecting old regions during each mixed GC: more aggressive -> more 
free space is made available quicker, less promptly -> each mixed GC 
might be shorter.

(come to think about it, maybe we should think of combining the last two 

But before we expose these we need to a) get a better understanding of 
what effect each has on G1's performance and b) find better names for 
them. :-) (I didn't try very hard, as I assumed that they will be 
develop parameters.)

> g1CollectorPolicy.cpp:
>       if (next_gc_should_be_mixed("end of young GC")) {
>         ergo_verbose0(ErgoMixedGCs,
>                       "start mixed GCs",
>                       ergo_format_reason("appropriate old regions 
> available"));
>         set_gcs_are_young(false);
>       } else {
>         ergo_verbose0(ErgoMixedGCs,
>                       "do not start mixed GCs",
>                       ergo_format_reason("appropriate old regions 
> available"));
>       }
> I think the ergo_verbose message in the else case at lines 1490-1492 
> should say "appropriate old regions not available", right?

Correct, thanks. John also caught that.

> We log the ergo decisions in next_gc_should_be_mixed(). Will it not be 
> duplicate information to log it in 
> G1CollectorPolicy::record_collection_pause_end() as well?

That's a good point. This is what the output looks like right now:

  3.708: [G1Ergonomics (Mixed GCs) next GC should be mixed, reason: 
appropriate old regions available, phase: end of young GC, remaining old 
regions: 16 regions, reclaimable: 9803640 bytes (29.22 %), threshold: 
1.00 %]
  3.708: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: appropriate 
old regions available]
  23M->18M(32M), 0.0411940 secs]


  3.938: [G1Ergonomics (Mixed GCs) next GC should be mixed, reason: 
appropriate old regions available, phase: end of mixed GC, remaining old 
regions: 12 regions, reclaimable: 5823976 bytes (17.36 %), threshold: 
1.00 %]
  3.938: [G1Ergonomics (Mixed GCs) do not end mixed GCs, reason: 
appropriate old regions available]
  23M->15M(32M), 0.0322850 secs]

I'll see how easy will be to parametarize next_gc_should_be_mixed() with 
the "do" / "do not" action strings (instead of the phase_str).

I'll send out a new webrev in a bit.


> Bengt
> On 2012-02-14 16:18, Tony Printezis wrote:
>> Hi all,
>> I'd like a couple of (quick please!) code reviews for this change:
>> The policy that was choosing old regions for collection during mixed 
>> GCs was buggy and it could exhibit many strange behaviors, one of 
>> which was to go into mixed GC mode, do "mixed GCs" (which they did 
>> not actually collect any old regions), and not get out of it for a 
>> while while preventing subsequent marking cycles not to start. This 
>> frequently caused evac failures and Full GCs.
>> The logic on which old regions to add to the CSet was spread to many 
>> places. I simplified that and put it mainly in the loop in the 
>> finalize_cset() method (renamed from choose_collection_set(), 
>> "finalize" is more descriptive on what the method does). I think the 
>> new version is easier to follow.
>> Description of the changes:
>> * I have introduced min and max number of old regions to be added to 
>> the CSet of each mixed GC. Max number is calculated as a percentage 
>> over the heap size (default: 10% - thanks to Jesper for suggesting to 
>> use a heap percentage for this) and ensures that collections will not 
>> get super long. Min number is calculated based on a desired mixed GC 
>> num after a marking cycle (default: 4) and ensures that each mixed GC 
>> will make some progress in collecting old regions (so that the 
>> candidate old regions are collected in a timely fashion).
>> * We now don't add any regions with live percentage over a threshold 
>> (default: 95%) to the CSet chooser and we do not consider them for 
>> collection.
>> * I stopped using the cache in the CSet chooser class (it was used to 
>> resort regions according to their latest GC efficiency), since it's 
>> not necessary any more: we'll now go through all the candidate old 
>> regions in the CSet chooser class and we don't have a heuristic of 
>> when to stop mixed GCs based on GC efficiency.
>> * I introduced the notion of "reclaimable bytes" on HeapRegions, 
>> which not only includes the predicted garbage bytes on each region, 
>> but also the unused space in the area [top,end) which will also be 
>> reclaimed. The CSet chooser class now keeps track of the total 
>> reclaimable bytes of all the regions that it tracks. If that falls 
>> under a certain threshold (default: 1% of the heap) we stop doing 
>> mixed GCs as we'll reclaim very little out of the remaining candidate 
>> old regions.
>> * I eliminated the case where a mixed GC starts and picks no old 
>> regions to collect (I hope!). Now the information on the CSet chooser 
>> (remaining region num / remaining reclaimable bytes) can tell us 
>> whether we want to do more mixed GCs or not. If we do, it's 
>> guaranteed that there will be old regions to collect. Because of 
>> that, I removed the _should_revert_to_young_gcs flag as it's not 
>> needed any more. It was used so that the CSet choice code could flag 
>> that we should stop doing mixed GCs at the end of the GC. Now, we can 
>> decide that by just looking at the information on the CSet chooser 
>> (the heuristic is encapsulated in the next_gc_should_be_mixed()).
>> * I also changed the policy in the case where a fixed young gen is 
>> used. Before, we were a bit arbitrary: during mixed GCs we'd cut the 
>> young gen size in half and fill up the rest with old regions. I 
>> thought that trying to re-use the "desired mixed GC number" heuristic 
>> for this made sense. So, I now add the min number of old regions to 
>> each mixed GC (as long as we don't go over the max). I think this is 
>> a more reasonable and less arbitrary heuristic to what we had before 
>> and it's more consistent with the non-fixed young gen policy. I 
>> didn't know whether I should decrease the young gen size for each 
>> mixed GC, like how we did before, and by how much. So, I now leave it 
>> unchanged (the user decided they want a fixed young gen, they will 
>> now always get it!).
>> * Updated all related ergo output to print the new information.
>> Like the marking changes, this change leaves a fair amount of unused 
>> code that we can remove. I had to draw the line somewhere on how much 
>> I should remove now and how much I should leave for later. I opened a 
>> new CR for the remaining cleanup:
>> 7145441: G1: collection set chooser-related cleanup
>> Many thanks to Charlie for doing some last-minute performance testing 
>> with my workspace. He was the one who discovered the problem with the 
>> never-ending mixed GCs and this fix eliminated the problem. 
>> Correctness-testing-wise, I've run overnight with the GC test suite.
>> Thanks,
>> Tony

More information about the hotspot-gc-dev mailing list