Status of JEP-8204088/JDK-8236073

Man Cao manc at google.com
Tue Jun 15 19:45:09 UTC 2021


Hi all,

Thank you for the feedback!

Regarding the relationship between GCTimeRatio/GCCpuPercentage and
SoftMaxHeapSize:

>> I am still not completely sure why current CPU overhead should influence
>> SoftMaxHeapsize directly and probably I'm just misunderstanding or this
>> is a terminology problem:
>>
>> What is currently suggested (well, with all the mentioned patches :)) is
>> that:
>>
>> - there are two completely separate soft heap size limits in play:
>>    1) the one from GCTimeRatio, by cpu usage/pause time, whatever
>>    2) SoftMaxHeapSize, by some external component
>>
>> - from those, the collector chooses the minimum as "actual limit". I
>> believe this separation should be kept. Potentially reporting for all of
>> these goals/limits could be improved, but I think some of my changes add
>> some (more debug-like) log messages.
>>
>> - both are always enabled, although 1) is known broken, and 2) not
>> implemented ;)

> This summary sounds correct to me. Without getting into implementation
details, I agree that there is only one heap size value, and it should be
the minimum of the values provided through GCTimeRatio/GCCPURatio and the
one suggested by the external components (container limit, current usage).
Man, what do you think about framing it in this manner?

I fully agree with the above description for their relationship.
Also agree that a harder limit such as CurrentMaxHeapSize could be
more effective for what we are trying to implement.

The problem we are trying to solve for the "Container RAM limit is fixed"
case,
is actually orthogonal to the relationship between the two flags.
Basically, we need a flag that can be adjusted dynamically (i.e.
"manageable" in HotSpot).
Then we can make either a JVM feature, or a non-JVM approach such as via an
agent,
to automatically set either GCTimeRatio/GCCpuPercentage
or SoftMaxHeapSize/CurrentMaxHeapSize
based on the container RAM usage/limit ratio.

Assuming using the GCCpuPercentage and CurrentMaxHeapSize flag names,
suppose we have a
JVM feature -XX:+StriveToStayWithinContainerRAMLimit, its behavior could be:
- If container RAM usage/limit ratio is below 90%, nothing needs to be done
and just use
  the default values for GCCpuPercentage and CurrentMaxHeapSize.
- If container RAM usage/limit ratio is 90%-95%, it could start trying to
reduce the heap size,
  either by increasing GCCpuPercentage, or shrinking CurrentMaxHeapSize.
- If container RAM usage/limit ratio is above 95%, it could try even harder
by further increasing
  GCCpuPercentage or shrinking CurrentMaxHeapSize.
In the above cases, there will be a limit on how
far -XX:+StriveToStayWithinContainerRAMLimit
could increase GCCpuPercentage or shrink CurrentMaxHeapSize.
We don't want to cause GC thrashing, as it is better to be killed by the
container manager and
restart the program, than to be stuck in GC thrashing. We could rely on
UseGCOverheadLimit
(JDK-8212084) for this purpose as well.

I'm not sure if we could make GCCpuPercentage manageable, but
CurrentMaxHeapSize will definitely be manageable. If
GCCpuPercentage is manageable, then -XX:+StriveToStayWithinContainerRAMLimit
could be built solely by changing GCCpuPercentage, without relying on
setting CurrentMaxHeapSize.

For getting CPU usage
> The problem is that apart from internal prototypes we never got around
> to add that. There's JDK-8027759 (and one more I think) though, even
> with a very very old patch.
>
> Another issue related to getting cpu usage I remember is support on some
> systems, and it may be spotty on others (i.e. granularity wise).
>
> Do you have any experience on that outside of Linux?

Great point. I haven't thought of this problem yet.
https://github.com/caoman/jdk/tree/G1ThreadsCPUTime contains a patch on how
we get the CPU times.
I see os::is_thread_cpu_time_supported() could return false on Windows and
BSD.
We will dig further to see how this could be implemented for these OSes.

-Man


On Sat, Jun 12, 2021 at 12:57 AM Thomas Schatzl <thomas.schatzl at oracle.com>
wrote:

> Hi,
>
> On 12.06.21 07:23, Jonathan Joo wrote:
> > Hi Thomas and Man,
> >
> > Thanks for the feedback.
> >
> >     I am still not completely sure why current CPU overhead should
> influence
> >     SoftMaxHeapsize directly and probably I'm just misunderstanding or
> this
> >     is a terminology problem:
> >
> >
> > I think this is just a bit of confusion in terminology. Man and I have
> > been using SoftMaxHeapSize interchangeably with the idea of a maximum
> > heap size allowed during a heap resizing event. The way you describe it
> > below seems like a better mental model of the idea:
> >
>
> Thanks for the clarification.
>
> >
> >     Note that SoftMaxHeapSize is just a soft goal, potentially some hard
> >     goal would be more useful. Also, there is a long way from limiting
> the
> >     Java heap size to limiting VM memory usage :)
> >
> > We were toying with the idea of using SoftMaxHeapSize due to the fact
> > that it already exists in other GC types. But admittedly for the
> > purposes of this project, having a hard heap limit would be better to
> > guarantee no OutOfMemoryErrors. Maybe a better solution to consider is
> > to have a hard heap limit, but then set SoftMaxHeapSize to some value
> > less than this hard limit? This gives us the best of both worlds by
> > allowing us to guarantee a maximum heap usage via the hard limit, but
> > may also make hitting the hard limit less likely. (I imagine hitting the
> > hard heap limit would be a costly operation, whereas hitting the
> > SoftMaxHeapSize limit is less so.)
>
> "CurrentMaxHeapSize" would act similarly to a new -Xmx. I.e. cause full
> gcs and eventually OOME. See
> https://bugs.openjdk.java.net/browse/JDK-8204088 and the related email
> thread.
>
> >
> >     Summing it up, the current available patches are:
> >     JDK-8238687 and JDK-8253413: improves (re-)sizing policy and acts on
> >     that at any young gc:
> >
> https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2
> >
> >     JDK-8248324 <https://bugs.openjdk.java.net/browse/JDK-8248324>:
> >     removes heap resizing at remark, which used a completely
> >     different policy anyway. Full gc is still an issue, but "it should
> not
> >     happen". Patch attached to CR.
> >     JDK-8236073: implements SoftMaxHeapSize, patch attached to CR.
> >
> >
> > Just to make sure I understand correctly -  when you say "patch attached
> > to CR" above, do you mean that they are included in your github repo
> > <
> https://github.com/tschatzl/jdk/tree/8238687-investigate-memory-uncommit-during-young-gc2
> >
> > Or are they accessible elsewhere which I need to patch myself if I want
> > to test these changes? (I would check myself, but I can't seem to find
> > the actual code snippets in the bugs.openjdk.java.net
> > <http://bugs.openjdk.java.net> links - do I need to have an OpenJDK
> > account to access them?)
>
> Attached to CR means attached to the JIRA issue. These are sometimes
> quite old, so they very likely need some major modifications.
>
> The access issues are interesting, I can see the attachments without
> logging in.
>
> E.g.
>
> https://bugs.openjdk.java.net/secure/attachment/94569/8248324-remove-resizing-during-remark
>
> https://bugs.openjdk.java.net/secure/attachment/94567/8236073-softmaxheapsize
> https://bugs.openjdk.java.net/secure/attachment/62184/cpu-time
>
> >
> > These discussions have been immensely valuable to us, so I appreciate
> > your prompt and thorough feedback. I will look further into your
> > GCTimeRatio changes/suggestions to see if they may be sufficient for our
> > needs.
>
> Looking forward to your contributions :p
>
> Thomas
>
>


More information about the hotspot-gc-dev mailing list