EpsilonGC and throughput.

Aleksey Shipilev shade at redhat.com
Wed Dec 20 19:05:54 UTC 2017

On 12/20/2017 03:46 PM, Thomas Schatzl wrote:
>> You would probably be okay with small inefficiencies within the class
>> library, if you can control the bulk of your own data either by
>> relying on particular classlib implementation, or winding up
>> your own.
> And e.g. Serial GC *by itself* has what particular dependency on
> something in the OpenJDK classlib that makes that impossible? (Maybe
> the java.lang.ref.reference stuff?)

This is not about Serial GC. Sergey's argument was that classlib allocations are outside of users'
control, and thus locality there is out of users' control either. My counter-point is that some
locality waste might be acceptable, as long as the bulk of the work is done by user
locality-conscious code anyway.

>> Well, nobody claimed Epsilon is a silver bullet. Before you can reap any of its benefits, you
>> have to get the footprint under control [*]. After that, you can start exploring exotic memory
>> management techniques,
> Can you explain to me how you can't do that with e.g. Serial GC? Is the allocation code in Serial
> that much different? Actually I think it should be almost the same.
Concentrating on allocation path misses the point.

The crucial point is that Epsilon *guarantees* the absence of GC, rather than relying on obscure
tuning of current GCs. In return, it trivially avoids setting up anything that might be needed
during GC cycle that other GCs would have to set up, on the off chance the configuration is wrong
and does accept the GC cycle in some corner cases. Examples: GC threads, task queues, card tables
and other remembered sets, barriers, special handling of Reference.get, finalizers, etc.

> If it is not, it may be useful to clean serial gc allocation code up instead of adding new stuff
> that does exactly the same (Hint: FastTLABRefill related code will go away in 11).

Epsilon already does share lots of code with gc/shared. For example, allocation code calls into
VirtualSpace::expand_by and ContiguousSpace::par_allocate to do the work on allocation path, and the
rest is handled by shared TLAB machinery. We can consider making coarser-grained API for allocations
like this, and that will save e.g. 20 lines of code in the allocation path. But I really think that
would be the over-zealous application of DRY principle, and would be against "prefer duplication
over the wrong abstraction" guideline.

> It won't be called "EpsilonGC" though, and won't have an extra switch,
> but benefit openjdk probably even more.

See, this is the guarantee thing again. Having the extra configuration to mimic what Epsilon does in
existing GC might be a way out, until you silently regress it via the interaction with some other GC
option, some other bugfix, or some other performance improvement, or because GC developers in their
wisdom changed the behavior ever so slightly. Having the GC that does not collect _by design_ makes
it hard to compromise this property.

Suppose you find the configuration that prevents GC in existing Serial code. Asserting the needed
behavior in current GC would mean developing white- or black-box style tests that assert the
configuration setting works as expected, and that also has to be revisited every time some
potentially-interacting GC feature / option is added. That is again, because Serial *might* collect,
and you just *hope* you got the config right so that GC does (not) happen when you do (not) need it.

This is about having the guarantees by design, instead of being hopeful about the configuration.
Epsilon makes an allocation failure the hard error, no excuses, no misconfiguration opportunities.

> The lukewarm reception from me is mostly because I am judging on the
> merits of what's in the JEP, not some future magic fairy dust that
> helps every collector anyway in the future. Can you at least give some
> ideas where you want to go with this, where Serial GC or any other
> existing GC will prevent progress? Base "another gc" (the "exotic
> memory management techniques") on it? That seems to contradict the
> purpose of Epsilon GC.

"Exotic memory management techniques" in my example is basically managing the persistent working
set, and disabling GC completely. Epsilon is not supposed to be extended with any new GC code --
like we saw in other thread, even simplistic Full GC is out of the question -- Epsilon does not do
any memory reclamation, period.

> To me personally the best argument that is given in the JEP seems to be
> that it helps validating the GC interface - but all other GCs
> implementing it also do that already to some degree (serial, parallel,
> not parallelold, cms, g1, probably Shenandoah, and Z).

The key thing is "personally to you" -- and that is fine. It does not mean other uses are wrong,
because you don't need them, or the expert can configure other GCs to do (barely) the similar (but
not exactly the same) thing.

> That just does not seem impressive.

Epsilon is not supposed to be impressive. Most of the useful tools are straight-forward and boring.
It would indeed be odd to gauge the tools by their impressiveness.

>> and no-op GC is one of many tools in the toolbelt there. What makes
>> Epsilon different from other tools is that it requires VM-side
>> implementation -- and this is why it should be included into JVM.
> The question is: do we need a new tool that only reinvents the old ones
> with minimal (I would dare to say non-real world) advantages.

Yes, we do. An year ago, I thought this was a thought (pun intended) experiment, and I would
probably have the same position -- just use the myriad of GC options to configure the existing GC.
But since then I had interesting talks with people who have use cases for the simple/trivial/dumb
no-op GC: most of these things are captured in JEP. Java ecosystem is vast, and even 0.1% of use
cases add up to substantial absolute number of use cases. In the interesting twist of fate, we are
even considering backporting Epsilon to JDK 8, because this is where the most current Java ecosystem
is -- and having separate implementation does give nice isolation guarantees for backports.

Coming to from a personal perspective, Epsilon is like peat whiskey for me: first taste feels very
wrong and you question the sanity of those enjoying it, and then, as you become familiar with it,
you realize it is just something else, in its essence, and you begin to see the appeal. It is not an
everyday drink, for sure.

>> [*] In fact, it is also called out in JEP, the other way around: fail
>> predictably when a lot is allocated. Over a few last months, I had a
>> pleasant experience asserting allocation pressure
>> invariants with just running with Epsilon with given heap and
>> checking if it fails. When it does, I have the full heap-dump view of
>> the garbage produced. This turns out to be much more convenient than
>> I previously anticipated.
> java "-XX:+UseSerialGC -Xmn<something> -Xms<something> -Xmx<something>
> -XX:SurvivorRatio=<something> -XX:+DumpHeapAtOome" (or something like
> this) myapplication
> seems to give exactly the same information.

Nope, it does not. Because Serial would still attempt at least one GC when faced with potential
OOME, and that will prune out the floating garbage -- and I am interested in *all* allocations. GC
guys might argue that allocations are cheap, and that GC cycles pruning dead objects are also cheap,
but the industrial reality is that people still hunt down and eliminate garbage allocations with
non-ignorable performance improvements. The ability to heap dump with no object left behind is
surprisingly useful. Again, some things are trivial in some GC designs. It is trivial to guarantee
all allocated objects end up in heap dump with the no-op GC.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20171220/04bb8656/signature.asc>

More information about the hotspot-gc-dev mailing list