RFR (M): 8027295: Free CSet takes ~50% of young pause time
mikael.gerdin at oracle.com
Thu Feb 20 12:12:55 UTC 2014
On Thursday 13 February 2014 18.54.13 Thomas Schatzl wrote:
> Hi all,
> can I have reviews for the following change that improves the (serial)
> performance of freeing the collection set? On applications that have a
> high amount of collection set regions, freeing the CSet takes up a large
> part of the entire collection pause (e.g. 50% on 2GB heaps) and/or takes
> really long in absolute terms (500ms on 460GB heaps).
> This change tries to introduce several small changes across CSet freeing
> that improve the total serial performance by around ~33%.
> It consists of the following changes (please also have a look at the CR
> for some figures):
> - manage code cache roots as set of chunks of nmethods
> - improves performance for code cache roots reclamation
> - also improves removing/adding elements slightly (no need to
> reallocate and copy around the entire GrowableArray)
> - this change is also a prerequisite for better load balancing code
> cache root scanning
> - some chunk cache to avoid malloc()/free() calls that were the
> performance issue using the FreeList class. (It unfortunately adds some
> interface clutter but I _really_ did not want to add the 100th
> implementation of a linked list in the GC code. It seems good enough).
Would it be possible to break out the nmethod chunk change as a separate
Since that change includes a bit of new code it would be easier to review it
and the interface changes needed for that bit separately.
I'm fine with combining the rest of the misc changes in one webrev.
> - fast card cache changes
> - pad FCC rows to cache line size to avoid any false sharing (every
> row represents the card cache for a single worker thread)
> - fixed (the surprising) main performance problem in FCC clearing by
> simply factoring out the call to HeapRegionRemSet::num_par_rem_sets()
> from the clear loop
> - a future change will extract the FCC into a separate class as
> cleanup (JDK-8034868)
> - moved the mutex to protect the OtherRegionsTable up to the
> - fixes a (potential) bug that we do not protect code roots cleanup by
> a lock
> - it seems to be more fitting, as this lock is actually supposed to
> protect the entire RSet, not only the OtherRegionsTable part
> - some interface changes to avoid locking mutexes unnecessarily during
> cleanup (seems to give 3% Free CSet time on TOPLINK)
> - i.e. the "locked" parameter for G1CollectedHeap::free_region().
> - added new statistics output separating young/nonyoung free cset time
> when G1LogLevel is set to finest
> - other changes
> - minor cleanups
> - the remaining changes in this area are
> - clearing and counting the length of the sparse RSet; that would need
> some quite intrusive RSet changes and is TODO.
> - parallelization: moved parallelization efforts into a separate CR,
> - concurrent collection set freeing: to be considered in a follow-up
> CR (JDK-8034873) for when parallelization stops scaling (like in cases
> when cset freeing already takes only a few ms and adding another thread
> just decreases performance) or just to decrease pause time further.
> JPRT with this version, specjbb*, specjvm*, dacapo, PSR tests (Fuse, BPM
> stress, SalesServer, TOPLINK) with a slightly less cleaned up version.
More information about the hotspot-gc-dev