Elastic JVM improvements [Was: Re: OpenJDK G1 Patch]

Thomas Schatzl thomas.schatzl at oracle.com
Thu May 24 10:03:04 UTC 2018

Hi Rodrigo, Ruslan,

  first, sorry for the late reply. I have been travelling, so a bit
short on time on thinking about and looking through this.

Thanks for your contribution. I think these ideas are a very
interesting and generally useful additions to the collector and/or

While they may not be perfect for all use cases, they imho improve the
collector sufficiently enough. Also, during reviews we may come up with
smaller improvements that improve its value and catch more use cases.

I will help you getting through the further process.

So the process to get this contribution into mainline would be roughly:

- get OCAs signed. As soon as you show up in the signatories list we
can actually start accepting patches, i.e. review them and discuss them
in more detail.
Depending on the patches' size it's probably best if you give me a
webrev when your names show up there and I can make them publicly

- since these two changes seem to be very interesting for a wider
public it seems that it would be useful to do JEPs for them. That might
also improve the understanding and their limitations by pointing them
out there, and facilitate the discussion.
This is basically describing the functionality a little more formally
using the template [0].

I can guide you through this, but in the beginning it might be useful
to just fill out the description in form of email.

- since we will add some command line options, we will later need to go
through the CSR for each of them. This is basically just letting
everyone know and definition of those [1].

Again, I will help you with most of the "paper"work.

Following are some initial questions and thoughts to the proposals.
They may be a bit confusing or somewhat unrelated though, please bear
with me :) 

On Sat, 2018-05-19 at 19:01 +0100, Rodrigo Bruno wrote:
> Dear OpenJDK community,
> Jelastic and INESC-ID have developed a patch for OpenJDK that
> improves elasticity of JVM with variable loads. The detailed
> description of the patch can be found below. We would like share this
> patch with the community and push it to the mainstream. We believe
> this work will help Java community to make JVM even better and
> improve the memory resources usage (save money) in the modern cloud
> environments. A more complete patch description can be found in
> the paper that will be presented in ISMM 2018.

> Elastic JVM Patch Description
> Elasticity is the key feature of the cloud computing. It enables to
> scale resources according to application workloads timely. Now we
> live in the container era. Containers can be scaled vertically on the
> fly without downtime. This provides much better elasticity and
> density compared to VMs. However, JVM-based applications are not
> fully container-ready. The first issue is that HotSpot JVM doesn’t
> release unused committed Heap memory automatically, and, therefore,
> JVM can’t scale down without an explicit call of the full GC.
> Secondly, it is not possible to increase the size of JVM Heap in
> runtime. If your production application has an unpredictable traffic
> spike, the only one way to increase the Heap size is to restart the
> JVM with a new Xmx parameter. 
> To solve these 2 major issues and make JVM more container friendly,
> we have implemented the following improvements: i) timely reduce the
> amount of unused committed memory; and ii) dynamically limit how
> large the used and committed memory can grow. The patch is
> implemented for the Garbage First collector.
> Timely Reducing Unused Committed Memory
> To accomplish this goal, the HotSpot JVM was modified to periodically
> trigger a full collection. Two full collections should not be
> separated by more than GCFrequency seconds, a dynamically user-
> defined variable. The GCFrequency value is ignored and therefore, 
> i.e., no full collection is triggered, if:
> GCFrequency is zero or below;

A time span seems to be different to a "frequency", this seems to be
more an interval like CMSTriggerInterval). Also I do not completely
follow that this interval is the minimum time between two *full*
collections. I would expect that any collection (or gc related pause)
would reset that time.
The paper also calls this "MinTimeBetweenGCs" if I read it correctly,
which is a somewhat better name.

> the average load on the host system is above MaxLoadGC. The MaxLoadGC
> is a dynamically user-defined variable. This check is ignored if
> MaxLoadGC is zero or below;

What is the scale for the "load", e.g. ranging from 0.0 to 1.0, and 1.0
is "full load"? Depending on that this condition makes sense.

The paper does not mention this.

> the committed memory  is above MinCommitted bytes. MinCommitted is a
> dynamically user-defined variable. This check is ignored if
> MinCommitted is zero or below;

While this is a different concern, have you ever considered using
MinHeapSize or InitialHeapSize here?

> the difference between the current heap capacity and the current heap
> usage is below MaxOverCommitted bytes. The MaxOverCommitted is a
> dynamically user-defined variable. This check is ignored if
> MaxOverCommitted is zero or below;
> The previously mentioned concepts are illustrated in the figure
> below:
> [...]
> The figure above depicts an application execution example where all
> the aforementioned variables come into play. The default value for
> all introduced variables (GCFrequency, MaxLoadGC, MaxOverCommitted,
> and, MinCommitted) is zero. In other words, by default, there are no
> periodic GCs.
> With this these modifications, it is possible to periodically
> eliminate unused committed memory in HotSpot. This is very important
> for applications that do not trigger collections very frequently and
> that might hold high amounts of unused committed memory. One example
> are web servers, whose caches can timeout after some minutes and
> whose memory might be underutilized (after the caches timeout) at
> night when the amount of requests is very low.

If I understood this paragraph correctly, the intent is to uncommit if
the system is idle (has low load for a certain amount of time).

Also, while it will become obvious with the patch, it will be
interesting to see how that load is defined. One reason is basically
that we support more systems than linux (the paper only mentions linux)
and it may be useful to support more than that platform.

I have one other question here, similar functionality could have been
achieved by some external entity periodically polling the vm for heap
size (a new jcmd or e.g. some MBean, or improving jmap or some other
tool) and then forcing a system.gc from outside. Did you ever consider

The reason is that this idea uses some mechanisms (detecting load of
system, lots of options) that may be better served and be more flexible
if mostly implemented outside the VM.

Having read Kirk P.'s concern about the mechanism to actually uncommit
memory being too simplistic, I kind of agree. The alternative, to
trigger a concurrent cycle plus multiple mixed collections (plus
uncommit heap at the end of that mixed phase) is a bit harder to
implement. I would certainly help you with that. :)

Also assuming that at that point the VM is idle, doing a full gc would
not hurt the application.

Also there is Michal's use case of periodically doing global reference
processing to clean out weak references regularly. This seems to be a
different use case, but would seem easy to do given that this change
probably implements something like CMSTriggerInterval for G1.

Maybe there is some way to marry these two issues somehow.

> -Xmx Dynamic Limit Update 
> To dynamically limit how large the committed memory (i.e. the heap
> size) can grow, a new dynamically user-defined variable was
> introduced: CurrentMaxHeapSize. This variable (defined in bytes)
> limits how large the heap can be expanded. It can be set at launch
> time and changed at runtime. Regardless of when it is defined, it
> must always have a value equal or below to MaxHeapSize (Xmx - the
> launch time option that limits how large the heap can grow). Unlike
> MaxHeapSize, CurrentMaxHeapSize, can be dynamically changed at
> runtime.
> For example dynamically set 1GB as the new Xmx limit
> jinfo -flag CurrentMaxHeapSize=1g <java_pid>
> Setting CurrentMaxHeapSize at runtime will trigger a full collection
> if the desired value is below the current heap size. After finishing
> the full collection, a second test is done to verify if the desired
> value is still above the heap size (note that a full collection will
> try to shrink the heap as much as possible). If the value is still
> below the current heap size, then an error is reported to the user.
> Otherwise, the operation is successful. 

One alternative here could be to use a marking cycle + mixed gcs to
reach that new CurrentMaxHeapSize again, which is again is a bit more
complicated to achieve. I can help you implementing that if interested.

In some cases you might even get away with just uncommitting empty
regions and doing nothing else in response to this command.

As Kirk mentioned, as another optimization, triggering a young gc could
free enough regions too.

> The limit imposed by the CurrentMaxHeapSize can be disabled if the
> variable is unset at launch time or if it is set to zero or below at
> runtime.
> This feature is important to cope with changes in workload demands
> and to avoid having to restart JVMs to cope with workload changes.

I have only one question about this here at this time: is this
CurrentMaxHeapSize a new "hard" heap size (causing OOME in the worst
case), or could this be temporarily exceeded and any excess memory
given back asap?
Would it be useful to have G1 more slowly adapt to that that new goal

As you can see I am pretty interested in the changes... :)

So overall, if you agree, I will open two JEPs in our bug tracker and
we can start discussing and filling out the details.


[0] http://openjdk.java.net/jeps/2
[1] https://wiki.openjdk.java.net/display/csr/Main

More information about the hotspot-gc-dev mailing list