RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1

Liang Mao maoliang.ml at alibaba-inc.com
Thu Jun 18 09:01:49 UTC 2020

Hi Thomas,

Sorry for replying this late. It's great to see the good progress of the approach
we've disscussed for a while. Resizing at any GC is definitly the right way. I have
 some quetions in inlined comments below.

BTW, I want to answer some questions in advance:
1) We may not be able to test this approach in our work loads recently since the versions
are quite different. But we shall want to merge this and further concurent uncommit stuff
 together later in JDK11.
2) JEP 346 is backported to our JDK11 and works fine as expected in some work loads. I guess
the new elastic solution in future would be better:)
3) The previous humongous proposal by aborting initial mark solved some problems but still had
the issue of frequent GC. We are now tunning this and verifying in our work loads.

> ------------------------------------------------------------------
> From:Thomas Schatzl <thomas.schatzl at oracle.com>
> Send Time:2020 Jun. 10 (Wed.) 17:31
> To:hotspot-gc-dev at openjdk.java.net <hotspot-gc-dev at openjdk.java.net>
> Subject:RFC: Adaptively resize heap at any GC/SoftMaxHeapSize for G1

> Hi all, Liang,

>   after a few months of busy working in the area of G1 heap resizing 
> and ultimately SoftMaxHeapSize support, I am fairly okay with a first 
> preview of these associated changes. So I would like to ask for feedback 
> on the current changes for what I intend to complete in the (early) 
> jdk16 timeframe.

> This is not a request for review of the changes for pushing, although 
> feedback on the code is also appreciated.

>  From my point of view only tuning a few heuristics and code polishing 
> is left to do as the change seems to do what it is intended to do.

> In particular it would be nice if Liang Mao, the original requestor of 
> all this functionality, could help with feedback on his loads. :)

> Just to recap: Sometime around end of last year, Liang posted review(s) 
> with functionality to:
> - concurrent uncommit of memory
> - implement SoftMaxHeapSize by uncommitting free memory

> That did not work well in some cases, so we agreed on us at Oracle 
> taking over. Today I would like to talk about the progress on the second 
> part :)

> The original proposal did not work well because it did not really change 
> how G1 resized the heap - i.e. SoftMaxHeapSize related changes to the 
> heap were quickly undone by regular heap expansion because it was too 
> aggressive for several reasons (e.g. bugs like JDK-8243672, 
> JDK-8244603), uncooperative (JDK-8238686) and never actually helped 
> shrinking or keeping a particular heap size.

> This resulted in lots of unnecessary heap changes even on known constant 

> After some analysis after fixing these issues (at least internally ;)) I 
> thought that for G1 to keep a particular heap size G1 needs to have an 
> element in its heap sizing control loop that pushes back on (excessive) 
> heap expansion.

> The best approach I thought of has been to introduce a *lower* 
> GCTimeRatio that G1 tries to stay *above* by resizing the heap. 
> Effectively, G1 then tries to stay within ["LowerGCTimeRatio", 
> GCTimeRatio] for its actual gc time ratio.

> That works out fairly well actually, and today I thought that the code 
> is in a state, while still heavy in development (it does look like that 
> :) still), could be provided for gathering feedback on more loads from you.

> First, how to try and use before going into the details and questions I 
> have:

> This is a series of patches, which I put up on cr.openjdk.net that need 
> to be applied on recent trunk:

> These are the ones already out for review:

> 1) JDK-8243672: http://cr.openjdk.java.net/~tschatzl/8243672/webrev.1/
> 2) JDK-8244603: http://cr.openjdk.java.net/~tschatzl/8244603/webrev/

> These are in the pipeline and not "fully complete" yet:

> 3) JDK-8238163: http://cr.openjdk.java.net/~tschatzl/8238163/webrev/ 
> (optional)
> 4) JDK-8238686: http://cr.openjdk.java.net/~tschatzl/8238686/webrev/
> 5) JDK-8238687: http://cr.openjdk.java.net/~tschatzl/8238687/webrev/
> 6) JDK-8236073: http://cr.openjdk.java.net/~tschatzl/8236073/webrev/

> All of the above: 
> http://cr.openjdk.java.net/~tschatzl/8236073/webrev.preview/

> What these do:

> (1) and (2) make the input variables to the control loop more 
> consistent. Since they are out for review, I would defer to the review 
> threads for them.

> (3) stabilizes IHOP calculation a bit, trying to improve uncommon 
> situations. This change is optional.

> (4) fixes the issue with resizing at Remark being totally disconnected 
> with actual load, causing some erratic expansions and shrinks.
> After some time tinkering with that I decided to remove resizing at 
> Remark - since we check heap size at every gc anyway, this is not 
> required any more (but also delaying uncommit to the next gc).

> (5) is the main change that implements a what has been mentioned above: 
> G1 tries to keep actual GC time ratio within the range of 
> LowerGCTimeRatio and GCTimeRatio. As long as actual GC time ratio is 
> within this range, no action occurs. As soon as it finds that there is a 
> trend of being outside, it tries to correct for that, internally trying 
> to reach an actual gc time ratio in the middle of that range.

Mostly I have some concerns in this change:
a) i didn't see you change the default GCTimeRatio in G1. Do you think the lowbound
 of 6 would be too low? I don't have a precise number but intuitively at least around 10
seems more safer for those online interactive applications. That means we have 20
as the default GCTimeRatio for G1.
b) It's a known issue about mixed GC. We know that mixed GC would severely decay
the GC time ratio. (I have no test result for abortable mixed gc after JDK12.) I'm not
sure if some work loads with heavy mixed GC would easily decrease the heap size. Or
abortable mixed GC can roughly make sure the GC time ratio in mixed GC phases is above
50% of normal young GC?

> (6) implements SoftMaxHeapSize on top of that, trying to steer IHOP so 
> that G1 does not use more than that. (I.e. a complete mess of 
> potentially conflicting goals ;)

> What I would like to ask you is try out these changes on your load(s), 
> and potentially report back with at least

> gc*,gc=debug,gc+ergo+heap=trace

> logging.

> Of course more feedback about how it works for you is even better, and 
> if you are adventurous, maybe try tuning (internal) knobs a bit, which 
> I'll describe in a minute :)

> As mentioned, the changes are not complete, here's what I think should 
> still be tuned a bit, and what I expect helps. The interesting method is 
> G1HeapSizingPolicy::resize_amount_after_young_gc().

> - determining the desired gc time ratio range: there is a new (likely 
> temporary) option G1MinimumPercentOfGCTimeRatio that determines the 
> lower gc time ratio described above as percentage of the original 
> GCTimeRatio. Currently set at 50%, which seems a good value as a too 
> tight range will cause lots of resizing (which might be good), and a too 
> large range will effectively disable shrinking (which also might be 
> desired).
> Either way, this value works fairly well so far in my tests. Suggestions 
> very appreciated.

> - detection of being outside of the expected gc time ratio range: this 
> somewhat works as before, separating short term and long term behavior.

> Long term behavior: every X gcs without a heap resize g1 looks if long 
> term gc ratio is outside of the bounds, if so, react. I think this is 
> fairly straightforward.

> Short term behavior: tracks the amount of times short term gc time ratio 
> exceeds the bounds in a single variable, incrementing or decrementing it 
> depending on whether current gc time ratio is above or below the gc time 
> ratio bounds. If that value exceeds certain thresholds, do something.

> There is a new bias towards expansion at startup to make g1 react faster 
> at that time, and some decay towards "no action to be taken" if for a 
> "long" time nothing happens.

> I reused the same values for "short" time (+/-4) and "long" (10) as 
> before, they seem to be okay.

> - actual resizing: expansion is supposed to be the same as before, 
> relatively aggressive, which I intend to keep.

> Shrinking is based on the number of free regions at the moment. This is 
> not optimal because e.g. you do not want to shrink below what is needed 
> for current eden (and the survivors of the next gc).

> Other than that it is bounded by a percentage of the number of free 
> regions (G1ShrinkByPercentOfAvailable). That results some heap size 
> undershoot in some cases (i.e. temporarily uncommitting a bit to much), 
> but in my tests it hasn't been too bad.

> Still rather (too) simple, expect some tunings and changes particularly 
> here, deviating a bit more from the expansion code.

> Comments and ideas in this area, particularly ones applied to your 
> workloads, particularly appreciated.

> Another big area not yet really tested is interaction with JEP 346: 
> Promptly Return Unused Committed Memory from G1, but I am certain that 
> with it you can reduce heap usage a lot (too much?).

> My performance (throughput) tests so far look almost always encouraging: 
> 20-30% less heap size with statistically insignificant throughput 
> changes. There are some exceptions, in these cases you loose 10% of 
> throughput for like 90% of less heap usage.

> The only really bad results come from tests that try to find the maximum 
> throughput of g1 by incrementally increasing the load finding out that 
> it does not work, slightly back off with the load and then increase the 
> load again to find an "equilibrium". From what I can tell it looks like 
> the heap sizing follows the application (i.e. what it's supposed to do), 
> making the application think it's already done while there is still more 
> heap available to potentially increase performance (looking at you 
> specjbb2015 out-of-box performance!).

> Not yet sure how to counter that, but some decrease in default 
> GCTimeRatio to decrease the shrinking aggressiveness (and keeping more 
> heap longer) might fix this.

> Of course, if you disable this adaptive heap sizing by fixing the heap 
> min/max in your benchmarks, there are no differences to before.

> One interesting upcoming change is to make MinHeapSize manageable 
> (JDK-8224879) to help the algorithm a bit.

> As closing words, given that the email is quite long already, thanks for 
> your attention and looking forward to feedback :)
> If you have questions, please chime in too, I am happy to answer them.

> Thanks,
>   Thomas

More information about the hotspot-gc-dev mailing list