RFC: JEP - ZGC: A Scalable Low-Latency Garbage Collector
kirk at kodewerk.com
Mon May 7 07:52:51 UTC 2018
At first read I would have edited out some of the hubris but on second thought it did make from some entertaining reading. The document is clear and to the point.. lets add another collector with much shorter pause times and some what less throughput that G1. Humor aside, I look forward to this current assault on tail latencies.
To that point, this point in the document is one that might need reformation.
"collection pauses become too long they start to affect application response times negatively. By removing or drastically reducing the length of GC pauses, we’re making Java a more attractive platform for an even wider set of applications.”
This is a commonly held POV that I’ve seen repeated most recently in an interesting paper about reducing the costs of work stealing in concurrent collectors (http://ranger.uta.edu/~jrao/papers/EuroSys18.pdf). As you may or may not already know, when John MacArther first put GC into LISP he reported an approximate overhead of 40%. On the surface sounds very bad until you consider that GC was responsible for both increasing overall application throughput and reducing the cognitive load of developers and the benefits that comes with that. Thus the net effect of GC is increased application through and response time with an often unacceptable effect on tail latencies. The battle you are engaged in here is one that will shorted the long tail in latencies due to GC pauses.
I wonder about the use of regions and copying in GC. In years past, IBM’s heap structures came with (Large Object Area) LOA and things called cluster buckets and made use of compaction (a.k.a) avoidance. My experience with those collectors is a wee bit dated (1.4.2)… but Iit worked very well, in fact better than any of the collectors available in Sun’s JVM even up to 1.6.. There was only one problem with IBM’s implementation and that was the compaction avoidance would sometimes get confused and wouldn’t compact with it should have and would result in long periods of the JVM trying to figure out if it should compact or … not.. which lead to very very very long JVM stalls. At any rate, compaction was an issue. This was also a problem with CMS leading to the dreadful “Concurrent mode failure” CMF condition. In fact tuning ParNew/CMS was all about avoiding CMF (aka compaction). Thus we know have a ton of field experience understand what causes heap fragmentation and how to avoid the need to compact it. Thus I know postulate that it should be possible to build a collector that can draw on IBM and Sun/Oracle collector knowledge to devise a collector that maybe able to avoid copying all together.
The main idea for LOA fragmentation avoidance keeping the large things away from the small things. Additionally, IBM used cluster buckets to bulk up the small things into a container. In effect IBM used size (instead of age) to direct where to allocate. Our experience with tuning CMS lead to the recipe; tune young gen to keep short lived objects out of tenured, tune tenured to allow time to help reduce the chance for a CMF. This involved not only proper sizing of tenured to setting the IOF to maintain a certain level of tidiness in that space (resulting in reduced ParNew pause times). I believe that there is a combination of both of these ideas from IBM and the Sun collectors that might eliminate the need for compaction leaving a pause time this bound either by TTSP or scan for roots time, which ever is greater. I’m wondering if anything like this has been discussed.
> On May 7, 2018, at 8:23 AM, Per Liden <per.liden at oracle.com> wrote:
> Hearing no objections, I'll submit this to become Candidate.
> On 04/30/2018 07:45 AM, Per Liden wrote:
>> We're looking at submitting the ZGC JEP to become Candidate. But first we'd like to solicit feedback.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-gc-dev