RFR (M): JDK-6672778: G1 should trim task queues more aggressively during evacuation pauses
thomas.schatzl at oracle.com
Mon Apr 9 11:20:05 UTC 2018
I am happy to finally bring this, one of the oldest G1 issues we
have, to a happy ending :)
So until now G1 buffered all oop locations it encountered during root
scanning (including from remembered sets and refinement queues) in the
per-thread work queues, and only drained them at the very end of the
I am not completely sure why this has been implemented this way, but it
has serious drawbacks:
- the work queues and overflow stacks may use a lot of memory, and I
mean *a lot*
- since we buffer all oop references, the prefetching G1 does goes to
waste as G1 always prefetches (during root scan) without following up
on it, wasting memory bandwidth.
Other GC's already employ this technique, so my best guess why G1 did
not so far is that G1 needs sub-timings for the various phases to get
prediction right, and even if doing timing is cheap, doing it too often
just adds up.
Anyway, this problem has been solved by implementing a hysteresis, i.e.
start trimming the work queues at a threshold higher than ending it,
and time the length of the trimming inbetween. So the timing
measurement overhead gets distributed across many work queue
Note that I did not do much testing about the optimal hysteresis range,
the suggested guess of 2xGCDrainStackTargetSize seems to be a pretty
good value (i.e. reduces overhead well enough).
Results are pretty good: I have seen reductions of the maximum task
queue size by multiple orders of magnitudes (i.e. less memory usage
during GC), and reduction of total pause time by up to 50%,
particularly on larger applications in the few GB heap range where
quite a bit of data is copied around every gc.
But also smaller applications and applications doing less copying
benefit quite a bit, reducing pause times significantly.
Note that there is a known, actually pre-existing bug with buffering up
references (by the now obsolete and removed BufferingOopClosure): the
sum of timings for the sub-phases of ext root scan may be larger than
the printed total. This will be addressed with the follow-up JDK-
8201313 to keep this change small, and it's a pre-existing issue anyway
hs-tier 1-3, perf tests
More information about the hotspot-gc-dev