RFC: Epsilon GC JEP

Roman Kennke rkennke at redhat.com
Tue Jul 18 13:28:26 UTC 2017

At the very least, Epsilon's a great tool for measuring the cost of

How many times have we heard the question: 'but what is the overhead of
the additional barriers of Shenandoah?' And we couldn't really answer
it. Compared to what? G1? Serial? Parallel? CMS? Each of which has their
own peculiarities when it comes to barriers.

With Epsilon it is possible to construct a benchmark that does certain
heap accesses (primitive/objects reads/writes special stuff like CASes,
etc) and do no more allocations (thus locality spread doesn't really
matter) and give an answer to those questions and say: no-barriers
throughput is this, and with that GC's barriers, we have this. etc

I realize that such results are a bit theoretical, but it gives a much
better idea than not having any way to measure this in an isolated way
at all.


Am 18.07.2017 um 15:20 schrieb Erik Österlund:
> Hi Aleksey,
> If I understand this correctly, the motivation for EpsilonGC is to be
> able to measure the overheads due to GC pauses and GC barriers and
> measure only the application throughput without GC jitter, and then
> use that as a baseline for measuring performance of an actual GC
> implementation compared to EpsilonGC.
> Howerver, automatic memory management is quite complicated when you
> think about it. Will EpsilonGC allocate all memory up-front, or expand
> the heap? In the case where it expanded on-demand until it runs out of
> memory, what consequences does that potential expansion have on
> throughput? In the case it is allocated upfront, will pages be
> pre-touched? If so, what NUMA nodes will the pre-mapped memory map in
> to? Will mutators try to allocate NUMA-local memory? What consequences
> will the larger heap footprint have on the throughput because of
> decreased memory locality and as a result increased last level cache
> misses and suddenly having to spread to more NUMA nodes? Does the
> larger footprint change the requirements on compressed oops and what
> encoding/decoding of oop compression is required? In case of an
> expanding heap - can it even use compressed oops? In case of a not
> expanding heap allocated up-front, does a comparison of a GC using
> compressed oops with a baseline that can inherently not use it make
> sense? Will lack of compaction and resulting possibly worse object
> locality of memory accesses affect performance?
> I am not convinced that we can just remove GC-induced overheads from
> the picture and measure the application throughput without the GC by
> using an EpsilonGC as proposed. At least I do not think I would use it
> to draw conclusions about GC-induced throughput loss. It seems like an
> apples to oranges comparison to me. Or perhaps I have missed something?
> Thanks,
> /Erik
> On 2017-07-18 13:23, Aleksey Shipilev wrote:
>> Hi Erik,
>> Thanks for looking into this!
>> On 07/18/2017 12:09 PM, Erik Helin wrote:
>>> first of all, thanks for trying this out and starting a discussion.
>>> Regarding
>>> the JEP, I have a few questions/comments:
>>> - the JEP specifies "last-drop performance improvements" as a
>>>    motivation. However, I think you also know that taking a pause and
>>>    compacting a heap that is mostly filled with garbage most likely
>>>    results in higher throughput*. So are you thinking in terms of
>>> pauses
>>>    here when you say performance?
>> This cuts both ways: while it is true that moving GC improves
>> locality [1], it
>> is also true that the runtime overhead from barriers can be quite
>> high [2, 3,
>> 4]. So, "performance" in that section is tied to both throughput (no
>> barriers)
>> and pauses (no pauses).
>> [1] https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality
>> [2] https://shipilev.net/jvm-anatomy-park/13-intergenerational-barriers
>> [3] Also, remember the reason for UseCondCardMark
>> [4] Also, remember the whole thing about G1 barriers
>>> - why do you think Epsilon GC is a good baseline? IMHO, no barriers is
>>>    not the perfect baseline, since it is just a theoretical exercise.
>>>    Just cranking up the heap and using Serial is more realistic
>>>    baseline, but even using that as a baseline is questionable.
>> It sometimes is. Non-generational GC is a good baseline for some
>> workloads. Even
>> Serial does not cut it, because even if you crank up old and trim
>> down young,
>> there is no way to disable reference write barrier store that
>> maintains card tables.
>>> - the JEP specifies this as an experimental feature, meaning that you
>>>    intend non-JVM developers to be able to run this. Have you
>>> considered
>>>    the cost of supporting this option? You say "New jtreg tests under
>>>    hotspot/gc/epsilon would be enough to assert correctness". For which
>>>    platforms? How often should these tests be run, every night?
>> I think for all platforms, somewhere in hs-tier3? IMO, current test
>> set in
>> hotspot/gc/epsilon is fairly complete, and it takes less than a
>> minute on my
>> 4-core i7.
>>> Whenever we want to do large changes, like updating logging,
>>> tracing, etc,
>>> will we have to take Epsilon GC into account? Will there be
>>> serviceability
>>> support for Epsilon GC, like jstat, MXBeans, perf counters etc?
>> I tried to address the maintenance costs in the JEP? It is unlikely
>> to cause
>> trouble, since it mostly calls into the shared code. And GC interface
>> work would
>> hopefully make BarrierSet into more shareable chunk of interface,
>> which makes
>> the whole thing even more self-contained. There is some new code in
>> MemoryPools
>> that handles the minimal diagnostics. MXBeans still work, at least
>> ThreadMXBean
>> that reports allocation pressure, although I'd need to add a test to
>> assert that.
>> To me, if the no-op GC requires much maintenance whenever something
>> in JVM is
>> changing, that points to the insanity of GC interface. No-op GC is a
>> good canary
>> in the coalmine for this. This is why one of the motivations is
>> seeing what
>> exactly a minimal GC should support to be functional.
>>> - You quote "The experience, however, tells that many players in the
>>>    Java ecosystem already did this exercise with expunging GC from
>>> their
>>>    custom-built JVMs". So it seems that those users that want something
>>>    like Epsilon GC are fine with building OpenJDK themselves? Having
>>>    -XX:+UseEpsilonGC as a developer flag is much different compared to
>>>    exposing it (and supporting, even if in experimental mode) to users.
>> There is a fair share of survivorship bias: we know about people who
>> succeeded,
>> do we know how many failed or given up? I think developers who do
>> day-to-day
>> Hotspot development grossly underestimate the effort required to even
>> build a
>> custom JVM. Most power users I know have did this exercise with great
>> pains. I
>> used to sing the same song to them: just build OpenJDK yourself, but
>> then pesky
>> details pour in. Like: oh, Windows, oh, Cygwin, oh MacOS, oh XCode,
>> oh FreeType,
>> oh new compilers that build OpenJDK with warnings and build does
>> treat warnings
>> as errors, oh actual API mismatches against msvcrt, glibc, whatever,
>> etc. etc.
>> etc. As much as OpenJDK build improved over the years, I am not
>> audacious enough
>> to claim it would ever be a completely smooth experience :) Now I am
>> just
>> willingly hand them binary builds.
>> So I think having the experimental feature available in the actual
>> product build
>> extends the feature exposure. For example, suppose you are the
>> academic writing
>> a paper on GC, would you accept custom-build JVM into your results,
>> or would you
>> rather pick up the "gold" binary build from a standard distribution
>> and run with it?
>>> I guess most of my question can be summarized as: this seems like it
>>> perhaps
>>> could be useful tool for JVM GC developers, why do you want to
>>> expose the flag
>>> to non-JVM developers (given all the work/support/maintenance that
>>> comes with
>>> that)?
>> My initial thought was that the discussion about the costs should
>> involve
>> discussing the actual code. This is why there is a complete
>> implementation in
>> the Sandbox, and also the webrev posted.
>> In the months following my initial (crazy) experiments, I had
>> multiple people
>> coming to me and asking when Epsilon is going to be in JDK, because
>> they want to
>> use it. And those were the ultra-power-users who actually know what
>> they are
>> doing with their garbage-free applications.
>> So the short answer about why Epsilon is good to have in product is
>> because the
>> cost seems low, the benefits are present, and so cost/benefit is
>> still low.
>>> It is _great_ that you are experimenting and trying out new ideas in
>>> the VM,
>>> please continue doing that! Please don't interpret my
>>> questions/comments as
>>> to grumpy, this is just my experience from maintaining 5-6 different GC
>>> algorithms for more than five years that is speaking. There is
>>> _always_ a
>>> maintenance cost :)
>> Yeah, I know how that feels. Look at the actual Epsilon changes, do
>> they look
>> scary to you, given your experience maintaining the related code?
>> Thanks,
>> -Aleksey

More information about the hotspot-gc-dev mailing list