RFR (L): 8027559: Decrease code size and templatizing in G1ParCopyClosure::do_oop_work
thomas.schatzl at oracle.com
Tue Feb 18 02:20:14 PST 2014
can I have some reviews for the following change that cleans up the G1
closures to allow some optimizations and decrease code size?
Problem is that the G1 closure code is too generic inhibiting some
(obvious) performance fixes. The code also uses oop closures for no
reason in some cases, and some methods seems to be simply misplaced,
requiring lots of calls to elsewhere, or creating unnecessary instances
All in all, this change decreases the object copy time by almost 10% (on
specjbb*), and code size by 20k (well, it is *something* :).
Here is a list of particular changes:
- moved G1ParCopyClosure::copy_to_survivor_space() to
G1ParScanThreadState to avoid multiple instantiations; also the method
heavily references G1ParScanThreadState already.
- introduced obj_needs_chunking() method that contains the conditions
for chunking arrays (used twice)
- moved the G1ParScanClosure as member of G1ParScanThreadState as it is
only used from there now (removed unused references from the other
- removed and inlined both G1ParScanPartialArrayClosure and
G1ParScanHeapEvacClosure into G1ParScanThreadState. Both closures were
never used in the oop closures, their methods only ever called directly.
This removes some repetitive boiler plate initialization code too. (And
for some reason G1ParScanHeapEvacClosure was part of the closures that
were used for specializing oop_oop_iterate() methods - removing
- in G1CollectedHeap::in_cset_fast_test(), the check whether the given
oop is in the heap is obsolete since perm gen removal. (That change
gives around half of the perf improvements).
- moved the partial array mask handling used for chunking into
G1ParThreadScanState as it is only ever used there. Specialized a few of
the methods to the actually used specializations only (potentially
saving a few bytes of code).
- moved the contents of G1ParScanPartialArrayClosure into
- moved G1ParScanHeapEvacClosure to
G1ParScanPartialArrayClosure::do_oop_evac(), specializing it completely,
avoiding the NULL check. Gives a few percent of performance. (As
mentioned in the method, we still need to do a in_cset_fast_test because
RSet scanning might claim the same card multiple times).
- reordered some if-then-else-clauses so that the common case (if it is
well-known) is located in the body of the if, and the uncommon case in
the else part.
- removed obsolete members _g1_rem, _during_initial_mark and
_mark_in_progress of G1ParClosureSuper. They were actually never used
even before this change. Saves a few bytes in all instances of it.
- moved G1ParClosureSuper::_cm to G1ParCopyHelper. It's not used in
G1ParClosureSuper::_cm, saving a few bytes for each G1ParClosureSuper
- moved G1ParCopyClosure::mark_[forwarded_]object() into
G1ParCopyHelper. Both do not depend on any template parameter, so they
just increased code size.
jprt, various benchmarks
More information about the hotspot-gc-dev