Status of JEP-8204088/JDK-8236073

Jonathan Joo jonathanjoo at google.com
Sat Jun 12 05:23:00 UTC 2021


Hi Thomas and Man,

Thanks for the feedback.

I am still not completely sure why current CPU overhead should influence
> SoftMaxHeapsize directly and probably I'm just misunderstanding or this
> is a terminology problem:
>

I think this is just a bit of confusion in terminology. Man and I have been
using SoftMaxHeapSize interchangeably with the idea of a maximum heap size
allowed during a heap resizing event. The way you describe it below seems
like a better mental model of the idea:

What is currently suggested (well, with all the mentioned patches :)) is
> that:
>
> - there are two completely separate soft heap size limits in play:
>    1) the one from GCTimeRatio, by cpu usage/pause time, whatever
>    2) SoftMaxHeapSize, by some external component
>
> - from those, the collector chooses the minimum as "actual limit". I
> believe this separation should be kept. Potentially reporting for all of
> these goals/limits could be improved, but I think some of my changes add
> some (more debug-like) log messages.
>
> - both are always enabled, although 1) is known broken, and 2) not
> implemented ;)
>

This summary sounds correct to me. Without getting into implementation
details, I agree that there is only one heap size value, and it should be
the minimum of the values provided through GCTimeRatio/GCCPURatio and the
one suggested by the external components (container limit, current usage).
Man, what do you think about framing it in this manner?

Note that SoftMaxHeapSize is just a soft goal, potentially some hard
> goal would be more useful. Also, there is a long way from limiting the
> Java heap size to limiting VM memory usage :)
>

We were toying with the idea of using SoftMaxHeapSize due to the fact that
it already exists in other GC types. But admittedly for the purposes of
this project, having a hard heap limit would be better to guarantee no
OutOfMemoryErrors. Maybe a better solution to consider is to have a hard
heap limit, but then set SoftMaxHeapSize to some value less than this hard
limit? This gives us the best of both worlds by allowing us to guarantee a
maximum heap usage via the hard limit, but may also make hitting the hard
limit less likely. (I imagine hitting the hard heap limit would be a costly
operation, whereas hitting the SoftMaxHeapSize limit is less so.)

Summing it up, the current available patches are:
> JDK-8238687 and JDK-8253413: improves (re-)sizing policy and acts on
> that at any young gc:
>
> https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2
> <https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2JDK-8248324>



JDK-8248324 <https://bugs.openjdk.java.net/browse/JDK-8248324>: removes
> heap resizing at remark, which used a completely
> different policy anyway. Full gc is still an issue, but "it should not
> happeen". Patch attached to CR.
> JDK-8236073: implements SoftMaxHeapSize, patch attached to CR.


Just to make sure I understand correctly -  when you say "patch attached to
CR" above, do you mean that they are included in your github repo
<https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2>?
Or are they accessible elsewhere which I need to patch myself if I want to
test these changes? (I would check myself, but I can't seem to find the
actual code snippets in the bugs.openjdk.java.net links - do I need to have
an OpenJDK account to access them?)

These discussions have been immensely valuable to us, so I appreciate your
prompt and thorough feedback. I will look further into your GCTimeRatio
changes/suggestions to see if they may be sufficient for our needs.

Thank you!

~ Jonathan

On Fri, Jun 11, 2021 at 4:30 AM Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi Man,
>
> On 09.06.21 19:56, Man Cao wrote:
> > Hi Thomas,
> >
> > Thanks for the feedback!
> >
> >  > Fwiw, in my opinion the intention of SoftMaxHeapSize has been more to
> >  > account for external user requirements not caught by the internal gc
> >  > load, not that gc load should guide SoftMaxHeapSize (and override it)
> >  > directly. I.e. as an orthogonal consideration for heap sizing.
> >
> > Yes. This should be the case if the user has set SoftMaxHeapSize
> explicitly.
> > We are actually considering two use cases that will be built on top of
> > the work of SoftMaxHeapSize and GCTimeRatio (or GCCpuRatio), and they
> both
> > relieve users from setting SoftMaxHeapSize (and/or Xmx) by themselves.
> >
> > 1. Container RAM limit is fixed. In this case, the goal is to keep total
> > container usage within the limit. If the usage is reaching the limit,
> the JVM could
> > observe the current CPU overhead. If the overhead is not too high, it
> can automatically set
> > a lower SoftMaxHeapSize to keep total container usage within the limit.
>
> I am still not completely sure why current CPU overhead should influence
> SoftMaxHeapsize directly and probably I'm just misunderstanding or this
> is a terminology problem:
>
> What is currently suggested (well, with all the mentioned patches :)) is
> that:
>
> - there are two completely separate soft heap size limits in play:
>    1) the one from GCTimeRatio, by cpu usage/pause time, whatever
>    2) SoftMaxHeapSize, by some external component
>
> - from those, the collector chooses the minimum as "actual limit". I
> believe this separation should be kept. Potentially reporting for all of
> these goals/limits could be improved, but I think some of my changes add
> some (more debug-like) log messages.
>
> - both are always enabled, although 1) is known broken, and 2) not
> implemented ;)
>
> >
> > 2. Container RAM limit can grow or shrink automatically. This is
> > probably unique in our production environment. The goal in this case is
> to make the JVM use as
> > much RAM as it needs, but not use too much that leads to memory waste.
> Ideally
> > this can be achieved by allowing Xmx=unlimited (JDK-4408373), then make
> the JVM respect
> > GCTimeRatio or GCCpuRatio better, so it does not grow the heap too much.
> > In practice, this can be achieved by setting a very large Xmx, and
> > making the JVM respect GCTimeRatio or GCCpuRatio better. (This use case
> may not require
> > SoftMaxHeapSize after all.)
>
> Both use cases can be covered by these mechanisms.
>
> Note that SoftMaxHeapSize is just a soft goal, potentially some hard
> goal would be more useful. Also, there is a long way from limiting the
> Java heap size to limiting VM memory usage :)
>
> >
> >  > To a large degree I think that pause time has (historically) been
> just a
> >  > more convenient to calculate (cross OS and everything) and a fairly
> >  > accurate substitute for GC cpu overhead.
> >
> > In my experience in JDK 11+G1, pause overhead could diverge significantly
> > from CPU overhead. I've seen cases where pause overhead is ~2%, but CPU
> > overhead is >50%, e.g., due to problems with humongous allocations
> > (perhaps already fixed by JDK-8245511 and JDK-8240556).
>
> I've seen that too sometimes, but mostly because of some exceptional
> condition as you mentioned that seem fixable.
>
> The problem is that apart from internal prototypes we never got around
> to add that. There's JDK-8027759 (and one more I think) though, even
> with a very very old patch.
>
> Another issue related to getting cpu usage I remember is support on some
> systems, and it may be spotty on others (i.e. granularity wise).
>
> Do you have any experience on that outside of Linux?
>
> >
> >  > Although I agreed above, there may be value in adding a new flag
> anyway:
> >  > GCTimeRatio is fairly clumsy to use (i.e. GCCpuRatio = 1 / (1 +
> >  > GCTimeRatio)). At least we should make it a floating point value....
> >
> > Regarding whether to change the meaning of GCTimeRatio or adding a
> > GCCpuRatio, I was a bit concerned about what happens if the user has
> already
> > set some value for GCTimeRatio for G1.
>
> I do not think many people intentionally set GCTimeRatio for G1, as it
> has never been promoted, but using MaxGCPauseMillis (and
> GCPauseTimeInterval, but I have nobody ever seen anyone using that
> latter in the wild - but I may be wrong).
>
> > I searched our repo and found less than 10 jobs setting GCTimeRatio, and
> > most of them are for non-G1 collectors. The cases for setting it with G1
> seem
> > unnecessary and can be removed.
>
> Yes.
>
> > So now I think we can make significant changes to the meaning of
> > GCTimeRatio for G1,
> > as it is not that effective with all the unresolved issues anyway.
> > Also agree that GCTimeRatio is clumsy to use.
> >
> > How about we introduce a new flag like GCCpuPercentage similar to
> > MaxRAMPercentage
> > from JDK-8186248? Then we can make GCTimeRatio a no-op flag for G1.
>
> Others may have an opinion too about adding new options, and in
> particular making GCTimeRatio a no-op (but e.g. ZGC and Shenandoah don't
> use them either).
> A conversion of GCTimeRatio to that GCCpuPercentage (or whatever) seems
> trivial though and could be made.
>
> Thanks,
>    Thomas
>


More information about the hotspot-gc-dev mailing list