RFR (L): 8027559: Decrease code size and templatizing in G1ParCopyClosure::do_oop_work
stefan.karlsson at oracle.com
Tue Feb 18 04:31:03 PST 2014
On 2014-02-18 11:20, Thomas Schatzl wrote:
> Hi all,
> can I have some reviews for the following change that cleans up the G1
> closures to allow some optimizations and decrease code size?
> Problem is that the G1 closure code is too generic inhibiting some
> (obvious) performance fixes. The code also uses oop closures for no
> reason in some cases, and some methods seems to be simply misplaced,
> requiring lots of calls to elsewhere, or creating unnecessary instances
> of methods.
> All in all, this change decreases the object copy time by almost 10% (on
> specjbb*), and code size by 20k (well, it is *something* :).
> Here is a list of particular changes:
> - moved G1ParCopyClosure::copy_to_survivor_space() to
> G1ParScanThreadState to avoid multiple instantiations; also the method
> heavily references G1ParScanThreadState already.
> - introduced obj_needs_chunking() method that contains the conditions
> for chunking arrays (used twice)
> - moved the G1ParScanClosure as member of G1ParScanThreadState as it is
> only used from there now (removed unused references from the other
> - removed and inlined both G1ParScanPartialArrayClosure and
> G1ParScanHeapEvacClosure into G1ParScanThreadState. Both closures were
> never used in the oop closures, their methods only ever called directly.
> This removes some repetitive boiler plate initialization code too. (And
> for some reason G1ParScanHeapEvacClosure was part of the closures that
> were used for specializing oop_oop_iterate() methods - removing
> generated code).
> - in G1CollectedHeap::in_cset_fast_test(), the check whether the given
> oop is in the heap is obsolete since perm gen removal. (That change
> gives around half of the perf improvements).
> - moved the partial array mask handling used for chunking into
> G1ParThreadScanState as it is only ever used there. Specialized a few of
> the methods to the actually used specializations only (potentially
> saving a few bytes of code).
> - moved the contents of G1ParScanPartialArrayClosure into
> - moved G1ParScanHeapEvacClosure to
> G1ParScanPartialArrayClosure::do_oop_evac(), specializing it completely,
> avoiding the NULL check. Gives a few percent of performance. (As
> mentioned in the method, we still need to do a in_cset_fast_test because
> RSet scanning might claim the same card multiple times).
> - reordered some if-then-else-clauses so that the common case (if it is
> well-known) is located in the body of the if, and the uncommon case in
> the else part.
> - removed obsolete members _g1_rem, _during_initial_mark and
> _mark_in_progress of G1ParClosureSuper. They were actually never used
> even before this change. Saves a few bytes in all instances of it.
> - moved G1ParClosureSuper::_cm to G1ParCopyHelper. It's not used in
> G1ParClosureSuper::_cm, saving a few bytes for each G1ParClosureSuper
> non-G1ParCopyHelper instance.
> - moved G1ParCopyClosure::mark_[forwarded_]object() into
> G1ParCopyHelper. Both do not depend on any template parameter, so they
> just increased code size.
This is a lot of different changes merged into one changeset. Could this
be split up into smaller, more focused changes?
> jprt, various benchmarks
More information about the hotspot-gc-dev