RFR (M): 8027959: Investigate early reclamation of large objects in G1
thomas.schatzl at oracle.com
Tue Jul 15 09:10:53 UTC 2014
could I have reviews for the following change that allows G1 to
eagerly/early reclaim humongous objects on every GC?
In G1 large objects are always allocated in the old generation,
currently requiring a complete heap liveness analysis (full gc, marking)
to reclaim them.
This is far from ideal for many transaction based enterprise
applications that create large objects that are only live until a
(typically short-lived) transaction has been completed (e.g. in a
ResultSet of a JDBC query that generates a large result,
This results in the heap filling up relatively quickly, typically
leading to unnecessary marking cycles just to reclaim them.
The solution implemented here is to directly target these types of
objects by using remembered set and reachability information from any GC
to make (conservatively) sure that we can reclaim the space.
You can quickly determine this if there are no references from the roots
or young gen to that object, and if there are no remembered set entries
to that object. This is sufficient because:
- g1 root processing walks over all roots and young gen always which
are sources for potential references.
- the remembered set contains potential locations that reference this
object. These are all locations, as humongous objects are always
allocated into their own regions (so there can be no intra-region
These are all potential reference locations during GC pause because GC
pause makes sure that the remembered set is current at pause time.
We can also reclaim if the region is considered live by the marking if
it has been allocated during marking (and the other conditions hold). At
reclaim time, if something referenced that object, there either must
have been a remembered set entry or a reference from the roots/or young
gen if it is actually live so nobody can install a reference from it any
(If there has once been a reference from another old region, it must
have had a remembered set entry).
When marking continues after GC, it will simply notice that the region
has been freed, and skip over it during continue.
After putting the humongous region into the collection set, liveness
detection occurs by intercepting the slow path for allocation of space
for that humongous object. As it is humongous, we always end up there.
The change includes some minor optimizations:
- during registering the humongous regions for inclusion into the
collection set, we already check whether that humongous object is
actually one we can potentially remove. E.g. has no remembered set. This
makes it a "humongous candidate" (note there is no actual state for
this, just a name for these regions)
- after finding out that the region is live once, remove that humongous
region from the collection set so that further references to it do not
cause use to go into the slow path. This is to avoid going into the slow
path too often if that object is referenced a lot. (Most likely, if that
object had many references it would not be a "humongous candidate" btw)
- if there were no candidates at the start of the GC, then do not
bother trying to reclaim later.
In total I found no particular slowdown when enabling this feature by
default. I.e. if there are no humongous candidate objects, there will be
no change to the current code path at all because none will be added to
the collection set.
The feature can be disabled completely by disabling
There is a new log line "Humongous Reclaim" measuring reclaim time, and
if with G1LogLevel=finest is set it prints some statistics about total,
candidate and reclaimed humongous objects on the heap.
The CR contains a graph showing large improvements on average humongous
object reclamation delay. In total we have seen some benchmarks
reclaiming GBs of heap space over time using this functionality (instead
of waiting for the marking/full GC). This improves throughput
significantly as there is more space available for the young gen on
Also it might avoid users to manually increase heap region sizes just to
avoid humongous object troubles.
jprt, aurora adhoc, various internal benchmarks
More information about the hotspot-gc-dev