RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1
stefan.johansson at oracle.com
Mon Sep 4 15:36:58 UTC 2017
Please review the implementation of JEP-307:
As communicated late last year , I've been working on parallelizing
the Full GC for G1. The implementation is now ready for review.
The approach I chose was to redo marking at the start of the Full GC and
not reuse the marking information from the concurrent mark cycle. The
main reason behind this is to maximize the chance of freeing up memory.
I reused the marking bitmap from the concurrent mark code though, so
instead of marking in the mark word a bitmap is used. The mark word is
still used for forwarding pointers, so marks will still have to be
preserved for some objects.
The algorithm is still a four phased mark-compact but each phase is
handled by parallel workers. Marking and reference processing is done in
phase 1. In phase 2 all worker threads work through the heap claiming
regions which they prepare for compaction. This is done by installing
forwarding pointers into the mark word of the live objects that will
move. The regions claimed by a worker in this phase will be the same
regions that the worker will compact in phase 4. This ensures that
objects are not overwritten before compacted.
In phase 3, all pointers to other objects are updated by looking at the
forwarding pointers. At this point all information needed to create new
remembered sets is available and this rebuilding has been added to phase
3. In the old version remembered set rebuilding was done separately
after the compaction, but this is more efficient.
As mentioned phase 4 is when the compaction is done. In this first
version, to avoid some complexity, there is no work stealing in this
phase. This will lead to some imbalance between the workers, but this
can be treated as a separate RFE in the future.
The part of this work that has generated the most questions during
internal discussions are the serial parts of phase 2 and 4. They are
executed if no regions are to be freed up by the parallel workers. It is
kind of a safety mechanism to avoid throwing a premature OOM. In the
case of no regions being freed by the parallel code path a single
threaded pass over the last region of each worker is done (at most
number-of-workers regions are handled) to further compact these regions
and hopefully free up some regions.
* A lot of local sanity testing, both functional and performance.
* Passed tier 1-5 of internal testing on supported platforms.
* No regressions in performance testing.
More information about the hotspot-gc-dev