RFR: 8236073: G1: Use SoftMaxHeapSize to guide GC heuristics

Thomas Schatzl thomas.schatzl at oracle.com
Wed Jan 15 08:37:21 UTC 2020


Hi,

On Wed, 2020-01-15 at 11:52 +0800, Liang Mao wrote:
> Hi Thomas,
> 
> I summarize the issues in as following:
> 
> 1. Criterion of SoftMaxHeapSize
> I agree to keep the policy of SoftMaxHeapSize similar with ZGC to
> make it unified. So "expand_heap_after_young_collection" is used for
> meeting the basic GCTimeRatio and expand heap immediately which
> cannot be blocked by any
> reasons. "adjust_heap_after_young_collection" cannot change the
> logic
> and I will take both expansion and shrink into consideration. Is my 
> understanding correct here?

Yes, ideally we would be close to ZGC in behavior with SoftMaxHeapSize.
If for some reason this does not work we may need to reconsider - but
we need a reason if possible backed by numbers/graphs of actual
behavior.

> 
> 2. Full GC with SoftMaxHeapSize
> In my thought non-explicit Full GC probably means the insufficiency 
> of heap capacity and we may not keep shrinking within SoftMaxHeapSize
> but explicit FGC don't have that issue. That's the only reason why I 

People run explicit FGC for many reasons, and the one you describe is
just one of them.

E.g. explicit FGC can be converted to a concurrent cycle or disabled
for other reasons, so having special behavior for this particular case
may just not work as intended in many cases. Users may then need to
decide then whether they want this behavor, or the system.gc-starts-
concurrent-cycle one they might also rely on.

The lone "System.gc()" call is insufficient to transport the actual
intent of the user - but that is a different issue.

> checked if it is explicit. But we will have the same determine logic
> to check if the heap can be shrinked so "explicit" check could be
> meaningless and I will remove that.

Exactly. 

> 
> 3. SoftMaxHeapSizeConstraintFunc doesn't check Xms
> The constraint function didn't make sure the SoftMaxHeapSize should
> less than Xms. Do we need to add the checking? It will not only
> affect G1...

I will check again later, but from what I remember from yesterday it
does check it at VM start (-Xms sets both minimum and initial heap
size). The constraint func does not check when the user changes the
value during runtime. So code using it must still maintain this
invariant in behavior.

> 4. commit/uncommit parallelism
> The concurrent uncommit will work with VMThread doing GC and GC may
> request to expand heap if not enough empty regions. So the
> parallelism is possible and immediate uncommit is a solution.

There may be others, but it actually seems easiest as blocking such a
request seems actually harder to implement, at least it's less
localized in the code. Completely *dropping* the request seems against
the rule that "SoftMaxHeapSize is a hint" guideline and may have other
unforeseen consequences too. Like I said, since G1 does not expand
then, there will be more GCs with the small heap, increasing the
current GCTimeRatio more than it should. Which means when ultimately
the request comes through as G1 will certainly try again, the increase
may be huge. (The increase is proportional to the difference in actual
and requested GCTimeRatio iirc).

Again, if there are good reasons to do otherwise I am open to
discussion, but it would be nice to have numbers to base decisions on.

> 4. More heap expansion/shrink heuristics further
> We have some data and experience in dynamimc heap adjustment in our
> workloads.
> The default GCTimeRatio 12 is really well tuned number that we found
> applications will have obvious timeout erros if it is less than ~12. 

It is actually *very* interesting to hear that the default G1
GCTimeRatio fits you well. Given over-time improvements in G1 gc
performance, I was already privately asking myselves whether to
decrease the default percentage, increasing this value (I hope I got
the directions right ;)) and similarly adjust the default
MaxGCPauseMillis down to reflect that from time to time.

> So it is kind of *hard* limit and we need to expand immediately if
> GCTimeRatio drops below 12. The difference in our workloads is that
> we will keep a GCTimeRatio nearly the original value 99 to make GC in

I.e. you set it to 99 at startup?

> a heathy state because allocation rate and outside input can vary
> violently that we don't want frequent adjustment. You know that in
> our 8u  implementation we just keep a conservative GC interval to
> achieve that. Comparing to the current code in JDK15, keeping
> GCTimeRatio as 99 is a different behavior which might have more
> memory footprint. 

As mentioned above, I think given that we both very thinking about
this, we might actually evaluate changing the defaults.

> I propose if we can still use the original option
> "-XX:+G1ElasticHeap" to keep the GCTimeRatio around 99 or a specified
> number. The default flow will make sure the GCTimeRatio is above the
> threshold 12 and concurrent commit/uncommit will adjust the heap to
> keep GCTimeRatio in a proper number that the adjustment is not
> urgent.

I am not completely sure what you want to achieve here or what the
problem is. I probably need to understand more about the problem and
potentially other solutions can be found.

As for a new -XX:+G1ElasticHeap option, it does not seem to make a
difference to set this or -XX:GCTimeRatio in this case (both are single
options). But I do not completely know the details here.

Thanks,
  Thomas




More information about the hotspot-gc-dev mailing list