Status of JEP-8204088/JDK-8236073
thomas.schatzl at oracle.com
Fri Jun 11 08:29:58 UTC 2021
On 09.06.21 19:56, Man Cao wrote:
> Hi Thomas,
> Thanks for the feedback!
> > Fwiw, in my opinion the intention of SoftMaxHeapSize has been more to
> > account for external user requirements not caught by the internal gc
> > load, not that gc load should guide SoftMaxHeapSize (and override it)
> > directly. I.e. as an orthogonal consideration for heap sizing.
> Yes. This should be the case if the user has set SoftMaxHeapSize explicitly.
> We are actually considering two use cases that will be built on top of
> the work of SoftMaxHeapSize and GCTimeRatio (or GCCpuRatio), and they both
> relieve users from setting SoftMaxHeapSize (and/or Xmx) by themselves.
> 1. Container RAM limit is fixed. In this case, the goal is to keep total
> container usage within the limit. If the usage is reaching the limit, the JVM could
> observe the current CPU overhead. If the overhead is not too high, it can automatically set
> a lower SoftMaxHeapSize to keep total container usage within the limit.
I am still not completely sure why current CPU overhead should influence
SoftMaxHeapsize directly and probably I'm just misunderstanding or this
is a terminology problem:
What is currently suggested (well, with all the mentioned patches :)) is
- there are two completely separate soft heap size limits in play:
1) the one from GCTimeRatio, by cpu usage/pause time, whatever
2) SoftMaxHeapSize, by some external component
- from those, the collector chooses the minimum as "actual limit". I
believe this separation should be kept. Potentially reporting for all of
these goals/limits could be improved, but I think some of my changes add
some (more debug-like) log messages.
- both are always enabled, although 1) is known broken, and 2) not
> 2. Container RAM limit can grow or shrink automatically. This is
> probably unique in our production environment. The goal in this case is to make the JVM use as
> much RAM as it needs, but not use too much that leads to memory waste. Ideally
> this can be achieved by allowing Xmx=unlimited (JDK-4408373), then make the JVM respect
> GCTimeRatio or GCCpuRatio better, so it does not grow the heap too much.
> In practice, this can be achieved by setting a very large Xmx, and
> making the JVM respect GCTimeRatio or GCCpuRatio better. (This use case may not require
> SoftMaxHeapSize after all.)
Both use cases can be covered by these mechanisms.
Note that SoftMaxHeapSize is just a soft goal, potentially some hard
goal would be more useful. Also, there is a long way from limiting the
Java heap size to limiting VM memory usage :)
> > To a large degree I think that pause time has (historically) been just a
> > more convenient to calculate (cross OS and everything) and a fairly
> > accurate substitute for GC cpu overhead.
> In my experience in JDK 11+G1, pause overhead could diverge significantly
> from CPU overhead. I've seen cases where pause overhead is ~2%, but CPU
> overhead is >50%, e.g., due to problems with humongous allocations
> (perhaps already fixed by JDK-8245511 and JDK-8240556).
I've seen that too sometimes, but mostly because of some exceptional
condition as you mentioned that seem fixable.
The problem is that apart from internal prototypes we never got around
to add that. There's JDK-8027759 (and one more I think) though, even
with a very very old patch.
Another issue related to getting cpu usage I remember is support on some
systems, and it may be spotty on others (i.e. granularity wise).
Do you have any experience on that outside of Linux?
> > Although I agreed above, there may be value in adding a new flag anyway:
> > GCTimeRatio is fairly clumsy to use (i.e. GCCpuRatio = 1 / (1 +
> > GCTimeRatio)). At least we should make it a floating point value....
> Regarding whether to change the meaning of GCTimeRatio or adding a
> GCCpuRatio, I was a bit concerned about what happens if the user has already
> set some value for GCTimeRatio for G1.
I do not think many people intentionally set GCTimeRatio for G1, as it
has never been promoted, but using MaxGCPauseMillis (and
GCPauseTimeInterval, but I have nobody ever seen anyone using that
latter in the wild - but I may be wrong).
> I searched our repo and found less than 10 jobs setting GCTimeRatio, and
> most of them are for non-G1 collectors. The cases for setting it with G1 seem
> unnecessary and can be removed.
> So now I think we can make significant changes to the meaning of
> GCTimeRatio for G1,
> as it is not that effective with all the unresolved issues anyway.
> Also agree that GCTimeRatio is clumsy to use.
> How about we introduce a new flag like GCCpuPercentage similar to
> from JDK-8186248? Then we can make GCTimeRatio a no-op flag for G1.
Others may have an opinion too about adding new options, and in
particular making GCTimeRatio a no-op (but e.g. ZGC and Shenandoah don't
use them either).
A conversion of GCTimeRatio to that GCCpuPercentage (or whatever) seems
trivial though and could be made.
More information about the hotspot-gc-dev