Thoughts about SubstrateVM GC

Roman Kennke rkennke at
Fri Mar 1 23:17:58 UTC 2019

I am working on that ;-) I have users with a case, and we can prototype and measure with Serial GC vs EpsilonGC+ Aleksey's sliding GC hack (, that should give us some ideas :-)

Cheers, Roman

Am 1. März 2019 23:13:09 MEZ schrieb "Peter B. Kessler" <Peter.B.Kessler at Oracle.COM>:
>I think we need requirements from users, representative benchmarks or
>real applications, inspired ideas, measurements of alternatives, and
>time to write garbage collectors. :-)
>So far, SubstrateVM has explored one point in the space, and we know
>that one size does not fit all when it comes to garbage collection.
>			... peter
>On 03/ 1/19 12:33 PM, Roman Kennke wrote:
>> Hi Peter,
>> picking up on our comments here:
>> It seems like there is a significant class of applications that would
>> attractive with SubstrateVM that would be happy to do no GC at all.
>> Shortlived microservices come to mind. For those applications, we'd
>> realistically do want a GC, but only as last resort. In other words,
>> single-space heap, that would trigger a collection only when
>> and then do sliding compaction, would do it, and arguably better than
>> the current GC: it would not require barriers overhead (i.e. pay for
>> something that we bet would never or very rarely happen), it would be
>> able to use the full space, rather than dividing up in generations
>> semispaces, and as you say, even eliminate the safepoint checks
>> (does SubstrateVM only do safepoints for GC? that would be cool!).
>> Even if it *does* GC, it might still be better off overall: objects
>> would have more time to die, GC frequency would be rarer. With
>> SubstrateVM I see relatively frequent full-GCs anyway, so rarer GCs
>> more relative garbage, combined with increased throughput: I guess
>> might actually work well for certain classes of applications,
>> those that would want to run on SubstrateVM anyway?
>> You commented about GraalVM runtime compiler allocating a lot and
>> leaving behind lots of garbage: would that be a concern in
>> closed world view?
>> WDYT?
>> Roman
>>> Two comments inline.  And more encouragement to send along your
>>>              ... peter
>>> On 03/ 1/19 02:16 AM, Roman Kennke wrote:
>>>> Hi Christian,
>>>> thanks for your replies. This is very interesting. Some additional
>>>> comments below inline:
>>>>>> The old generation is 2 semispaces (actually, 4 with the 2 pinned
>>>>>> spaces, which I'll ask about later). When collected, live objects
>>>>>> scavenged back-and-forth between the two spaces.
>>>>> yes, in theory. In "reality" there is no contiguous memory for the
>>>>> spaces, so after a GC all the aligned chunks of the from-space are
>>>>> either returned to the OS or cached and immediately reused for
>>>>> generation allocation.
>>>> Aha, ok. This definitely affects future design decisions.
>>>>>> - The policy when to start collecting seems a bit unclear to me.
>In my
>>>>>> understanding, there is (almost) no use (for STW GCs) in starting
>>>>>> collection before allocation space is exhausted. Which means, it
>>>>>> an obvious trigger to start collection on allocation failure.
>Yet, the
>>>>>> policies I'm looking at are time-based or time-and-space-based. I
>>>>>> 'almost' because the single use for time-based collection would
>>>>>> periodic GC that is able to knock out lingering garbage during
>>>>>> no/little-allocation phase of an application, and then only when
>the GC
>>>>>> is also uncommitting the resulting unused pages (which, afaics,
>>>>>> Substrate GC would do: bravo!). But that doesn't seem to be the
>>>>>> point of
>>>>>> the time-based policies: it looks like the goal of those policies
>is to
>>>>>> balance time spent in young-vs-old-gen collections.?!
>>>>> The young generation size is fixed, and a collection is started
>>>>> this space is filled. So from that point of view, the starting
>>>>> of a GC is always space based.
>>>>> The policy whether to do an incremental (young) collection or a
>>>>> collection is time based (but you can easily plug in any policy
>>>>> want). The goal is to balance time between incremental and full
>>>>> collection. We certainly don't want to fill up the old generation
>>>>> because that is the maximum heap size, and it is by default
>>>>> to be very big.
>>>> Ok, I see.
>>>> Also, the way it's currently done, the old-gen needs to be able to
>>>> absorb (worst-case) all of young-gen in next cycle, and therefore
>>>> *plenty* of headroom. I.e. we need to collect old-gen much earlier
>>>> when it's full (like when remaining free space in old-gen is
>>>> than young-gen size). Alternatively, we could exhaust old-gen, and
>>>> change young-gen-collection-policy to skip collection if old-gen
>>>> have enough space left, and dive into full-GC right away. Or, even
>>>> better, add an intermediate tenuring generation. :-)
>>> There is no fixed heap size, or fixed generation sizes.  As long as
>>> collector can allocate memory from the OS it can keep adding chunks
>>> needed to the old generation (or the young generation, for that
>>> E.g., to delay collection until it is "convenient".)  If you run out
>>> address space, or physical memory, then you are in trouble.
>>>>>> With a little bit of distance and squinting of eyes, one can see
>>>>>> Substrate GC's young generation is really what is called 'nursery
>>>>>> space'
>>>>>> elsewhere, which aims to reduce the rate at which objects get
>>>>>> introduced
>>>>>> into young generation. And the old generation is really what is
>>>>>> called young generation elsewhere. What's missing is a true old
>>>>>> generation space.
>>>>> Not really, because the young generation can be collected
>>>>> i.e., there are the generational write barriers, remembered sets,
>>>>> So the young generation is reduced to the nursery space, but I
>argue the
>>>>> old generation is really an old generation.
>>>> Ok.
>>>>>> Considering all this, I would like to propose some improvements:
>>>>>> - Introduce a notion of tenuring objects. I guess we need like 2
>>>>>> bits in the header or elsewhere for this. Do we have that room?
>>>>> You don't need the age bits in the header. You can easily go from
>>>>> object to the aligned chunk that the object is in (we do that all
>>>>> time, for example for the write barrier to do the card marking),
>>>>> store the age in the chunk header. Requiring all objects in one
>chunk to
>>>>> have the same age is not much of a limitation.
>>>> Right.
>>>>> Adding tenuring is definitely necessary to achieve reasonable GC
>>>>> performance.
>>>> +1
>>>>>> - Implement a true old-space (and rename the existing young to
>>>>>> and old to young?). In my experience, sliding/mark-compact
>>>>>> using a mark bitmap works best for this: it tends to create a
>>>>>> 'sediment'
>>>>>> of permanent/very-long-lived objects at the bottom which would
>>>>>> get
>>>>>> copied again. Using a bitmap, walking of live objects (e.g.
>>>>>> copying, updating etc) would be very fast: much faster than
>>>>>> objects by their size.
>>>>> A mark-and-compact old generation algorithm definitely makes
>>>>> Again, the only reason why we don't have it yet is that no one had
>>>>> to implement it.
>>>>> Mark-and-compact is also great to reduce memory footprint. Right
>>>>> during GC the memory footprint can double because of the temporary
>>>>> for copying.
>>>> Yeah. However, as Peter noted, having no contiguous memory block
>>>> complicates this. I'd need to see how to deal with it
>>>> probably, or maybe mark bit in object header, with some clever
>tricks to
>>>> make scanning the heap fast like serial GC does).
>>>>>> - I am not totally sure about the policies. My current thinking
>is that
>>>>>> this needs some cleanup/straightening-out, or maybe I am
>>>>>> misunderstanding something there. I believe (fairly strongly)
>>>>>> allocation failure is the single useful trigger for STW GC, and
>on top
>>>>>> of that an (optional) periodic GC trigger that would kick in
>after X
>>>>>> (milli)seconds no GC.
>>>>> As I mentioned above, the GC trigger is allocation failure for the
>>>>> generation.
>>>> Ok, good.
>>>>>> - Low-hanging-fruit improvement that could be done right now:
>>>>>> large objects(arrays) straight into old-gen instead of copying
>>>>>> around. Those are usually long-lived anyway, and copying them
>>>>>> back-and-forth just costs CPU time for no benefit. This will
>>>>>> even
>>>>>> more pronounced with a true old-gen.
>>>>> Large arrays are allocated separately in unaligned chunks. Such
>>>>> are never copied, but only logically moved from the young
>>>>> into the old generation. An unaligned chunk contains exactly one
>>>>> array.
>>>> Ok, good.
>>>>>> Oh and a question: what's this pinned object/chunks/spaces all
>>>>> There are two mechanisms right now to get objects that are never
>>>>> by the GC:
>>>>> 1) A "normal" object can be temporarily pinned using
>>>>> org.graalvm.nativeimage.PinnedObject. The current implementation
>>>>> keeps the whole aligned chunk that contains the object alive,
>i.e., it
>>>>> is designed for pinnings that are released quickly so that no
>>>>> are actually ever pinned when the GC runs, unless the GC runs in
>>>>> unlucky moments. We use such pinning for example to pass pointers
>>>>> byte[] arrays directly to C functions without copying.
>>>>> 2) A PinnedAllocator can be used to get objects that are
>non-moving for
>>>>> a long period of time. This is currently used for the metadata of
>>>>> runtime compiled code. We are actively working to make
>>>>> unnecessary by putting the metadata into C memory, and then
>hopefully we
>>>>> can remove PinnedAllocator and all code that is necessary for it
>in the
>>>>> GC, i.e., the notion of pinned spaces you mentioned before.
>>>> Ok, I guessed so. I mostly wondered about it because it's got
>>>> and to-space:
>>>>       [pinnedFromSpace:
>>>>         aligned: 0/0 unaligned: 0/0]
>>>>       [pinnedToSpace:
>>>>         aligned: 0/0 unaligned: 0/0]]
>>>> And would it ever copy between them? I guess not.
>>> The collector logically moves a pinned chunk from pinned from-space
>>> pinned to-space by updating bookkeeping information in the chunk. 
>>> contents of the pinned chunk are not moved, and their addresses do
>>> change.  If a pinned chunk is unpinned by the application, it is
>>> to the unpinned from-space and at the next full collection the
>>> objects in it are scavenged to the unpinned to-space, like any other
>>> objects in unpinned from-space.  Between collections the pinned
>>> is empty.  In your example, the pinned from-space is also empty. 
>>> do not represent address ranges, so an empty space is just a few
>>> pointers in the space data structure.  (Spaces not being address
>>> also complicates answer questions like: Is this object in the young
>>> generation.)
>>>              ... peter
>>>>>> What do you think about all this? Somebody else might have
>>>>>> about
>>>>>> all this already, and have some insights that I don't have in my
>>>>>> understanding? Maybe some of it is already worked on or planned?
>>>>>> there are some big obstactles that I don't see yet, that make it
>>>>>> feasible?
>>>>> We certainly have ideas and plans, and they match your
>observations. If
>>>>> you are interested in contributing, we can definitely give you
>>>>> guidance so that you immediately work into the right direction.
>>>> Yes, I am. :-)
>>>> Roman

Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

More information about the graal-dev mailing list