RFR (M): 8027295: Free CSet takes ~50% of young pause time

Thomas Schatzl thomas.schatzl at oracle.com
Thu Feb 13 17:54:13 UTC 2014

Hi all,

  can I have reviews for the following change that improves the (serial)
performance of freeing the collection set? On applications that have a
high amount of collection set regions, freeing the CSet takes up a large
part of the entire collection pause (e.g. 50% on 2GB heaps) and/or takes
really long in absolute terms (500ms on 460GB heaps).

This change tries to introduce several small changes across CSet freeing
that improve the total serial performance by around ~33%.

It consists of the following changes (please also have a look at the CR
for some figures):

- manage code cache roots as set of chunks of nmethods
  - improves performance for code cache roots reclamation
  - also improves removing/adding elements slightly (no need to
reallocate and copy around the entire GrowableArray)
  - this change is also a prerequisite for better load balancing code
cache root scanning
  - some chunk cache to avoid malloc()/free() calls that were the
performance issue using the FreeList class. (It unfortunately adds some
interface clutter but I _really_ did not want to add the 100th
implementation of a linked list in the GC code. It seems good enough).

- fast card cache changes
  - pad FCC rows to cache line size to avoid any false sharing (every
row represents the card cache for a single worker thread)
  - fixed (the surprising) main performance problem in FCC clearing by
simply factoring out the call to HeapRegionRemSet::num_par_rem_sets()
from the clear loop
  - a future change will extract the FCC into a separate class as
cleanup (JDK-8034868)

- moved the mutex to protect the OtherRegionsTable up to the
  - fixes a (potential) bug that we do not protect code roots cleanup by
a lock
  - it seems to be more fitting, as this lock is actually supposed to
protect the entire RSet, not only the OtherRegionsTable part

- some interface changes to avoid locking mutexes unnecessarily during
cleanup (seems to give 3% Free CSet time on TOPLINK)
  - i.e. the "locked" parameter for G1CollectedHeap::free_region().

- added new statistics output separating young/nonyoung free cset time
when G1LogLevel is set to finest

- other changes
  - minor cleanups

- the remaining changes in this area are
  - clearing and counting the length of the sparse RSet; that would need
some quite intrusive RSet changes and is TODO.
  - parallelization: moved parallelization efforts into a separate CR,
  - concurrent collection set freeing: to be considered in a follow-up
CR (JDK-8034873) for when parallelization stops scaling (like in cases
when cset freeing already takes only a few ms and adding another thread
just decreases performance) or just to decrease pause time further.



JPRT with this version, specjbb*, specjvm*, dacapo, PSR tests (Fuse, BPM
stress, SalesServer, TOPLINK) with a slightly less cleaned up version.


More information about the hotspot-gc-dev mailing list