RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1
stefan.johansson at oracle.com
Tue Sep 19 15:32:04 UTC 2017
We're moving forward with the review internally and doing some
performance enhancements as well. Here are updated webrevs:
Note that the full webrev is based on the new consolidated repo, but the
incremental was generated with the old structure.
Highlight in this update:
* Cleaned out unused code in PreservedMarks.
* Fixed memory leak in GenericTaskQueueSet.
* HeapRegionClaimerBase has been removed and instead we now have two
functions to iterate through all heap regions.
* General cleanups and renames to ease understanding the code.
* G1 Hot Card Cache cleanup made parallel and moved into appropriate phase.
* Updated HeapRegion::apply_to_marked_objects to be a template function
to avoid virtual call.
Thanks Erik D and Thomas S for all comments so far.
On 2017-09-04 17:36, Stefan Johansson wrote:
> Please review the implementation of JEP-307:
> As communicated late last year , I've been working on parallelizing
> the Full GC for G1. The implementation is now ready for review.
> The approach I chose was to redo marking at the start of the Full GC
> and not reuse the marking information from the concurrent mark cycle.
> The main reason behind this is to maximize the chance of freeing up
> memory. I reused the marking bitmap from the concurrent mark code
> though, so instead of marking in the mark word a bitmap is used. The
> mark word is still used for forwarding pointers, so marks will still
> have to be preserved for some objects.
> The algorithm is still a four phased mark-compact but each phase is
> handled by parallel workers. Marking and reference processing is done
> in phase 1. In phase 2 all worker threads work through the heap
> claiming regions which they prepare for compaction. This is done by
> installing forwarding pointers into the mark word of the live objects
> that will move. The regions claimed by a worker in this phase will be
> the same regions that the worker will compact in phase 4. This ensures
> that objects are not overwritten before compacted.
> In phase 3, all pointers to other objects are updated by looking at
> the forwarding pointers. At this point all information needed to
> create new remembered sets is available and this rebuilding has been
> added to phase 3. In the old version remembered set rebuilding was
> done separately after the compaction, but this is more efficient.
> As mentioned phase 4 is when the compaction is done. In this first
> version, to avoid some complexity, there is no work stealing in this
> phase. This will lead to some imbalance between the workers, but this
> can be treated as a separate RFE in the future.
> The part of this work that has generated the most questions during
> internal discussions are the serial parts of phase 2 and 4. They are
> executed if no regions are to be freed up by the parallel workers. It
> is kind of a safety mechanism to avoid throwing a premature OOM. In
> the case of no regions being freed by the parallel code path a single
> threaded pass over the last region of each worker is done (at most
> number-of-workers regions are handled) to further compact these
> regions and hopefully free up some regions.
> * A lot of local sanity testing, both functional and performance.
> * Passed tier 1-5 of internal testing on supported platforms.
> * No regressions in performance testing.
More information about the hotspot-gc-dev