RFR (M): 8027295: Free CSet takes ~50% of young pause time
thomas.schatzl at oracle.com
Thu Feb 13 17:54:13 UTC 2014
can I have reviews for the following change that improves the (serial)
performance of freeing the collection set? On applications that have a
high amount of collection set regions, freeing the CSet takes up a large
part of the entire collection pause (e.g. 50% on 2GB heaps) and/or takes
really long in absolute terms (500ms on 460GB heaps).
This change tries to introduce several small changes across CSet freeing
that improve the total serial performance by around ~33%.
It consists of the following changes (please also have a look at the CR
for some figures):
- manage code cache roots as set of chunks of nmethods
- improves performance for code cache roots reclamation
- also improves removing/adding elements slightly (no need to
reallocate and copy around the entire GrowableArray)
- this change is also a prerequisite for better load balancing code
cache root scanning
- some chunk cache to avoid malloc()/free() calls that were the
performance issue using the FreeList class. (It unfortunately adds some
interface clutter but I _really_ did not want to add the 100th
implementation of a linked list in the GC code. It seems good enough).
- fast card cache changes
- pad FCC rows to cache line size to avoid any false sharing (every
row represents the card cache for a single worker thread)
- fixed (the surprising) main performance problem in FCC clearing by
simply factoring out the call to HeapRegionRemSet::num_par_rem_sets()
from the clear loop
- a future change will extract the FCC into a separate class as
- moved the mutex to protect the OtherRegionsTable up to the
- fixes a (potential) bug that we do not protect code roots cleanup by
- it seems to be more fitting, as this lock is actually supposed to
protect the entire RSet, not only the OtherRegionsTable part
- some interface changes to avoid locking mutexes unnecessarily during
cleanup (seems to give 3% Free CSet time on TOPLINK)
- i.e. the "locked" parameter for G1CollectedHeap::free_region().
- added new statistics output separating young/nonyoung free cset time
when G1LogLevel is set to finest
- other changes
- minor cleanups
- the remaining changes in this area are
- clearing and counting the length of the sparse RSet; that would need
some quite intrusive RSet changes and is TODO.
- parallelization: moved parallelization efforts into a separate CR,
- concurrent collection set freeing: to be considered in a follow-up
CR (JDK-8034873) for when parallelization stops scaling (like in cases
when cset freeing already takes only a few ms and adding another thread
just decreases performance) or just to decrease pause time further.
JPRT with this version, specjbb*, specjvm*, dacapo, PSR tests (Fuse, BPM
stress, SalesServer, TOPLINK) with a slightly less cleaned up version.
More information about the hotspot-gc-dev