RFR (S): 8212911: Unify and micro-optimize handling of non-in-collection set references in oop closures
thomas.schatzl at oracle.com
Wed Oct 24 11:36:48 UTC 2018
can I have reviews for this small change that does some minor cleanup
and micro-optimization of our main g1 stw/concurrent gc closures?
- at the moment G1 randomly either adds a reference directly to the
remembered set or puts it into the DCQS. I propose to, at least for
closures called during stw to always put the reference into the DCQS
and let the concurrent phase (or the next GC phase) deal with that as
the current remembered set implementation is pretty bad.
We may or may not get to fixing this for 12, but for now it seems to be
better to do so.
(No significant benchmark differences except for one 3% improvement of
specjvm2008-xml, but overall it seems to make no difference)
- enqueuing references always checks whether the from-region is in
young to skip them. For that purpose, the closures get passed the from-
region. Now the problem with that is that actually we do not need the
from-region, but the result that the from-region is a young region
(then we do not need to push that reference onto the DCQS), so the code
calculates that result, i.e. from.is_young(), for every such reference.
However at the caller of oop_iterate(), we often already know whether
the from region is young (from other calculations), or specifically
determine the from-region just for the iteration.
The idea here is to pass from->is_young() directly, so we do not need
to recalculate it for every reference.
Now for the G1ScanEvacuatedObjClosure one could imagine to specialize
the closure itself based on the from-region (e.g. having both a
G1ScanEvacuatedObjFromOldClosure), however that results in code
duplication at the caller (at least without splitting the task queue)
the need for exactly that distinction there as well.
Further, measurements did not yield a difference, so I kept the code a
little more compact for now (there is an RFE out for this).
We may re-evalute the latter as soon as Scan RS time variance has been
improved a bit (for which this change helps a bit, and I have some
changes in the pipeline. Actually this change has been broken out from
- unify on using HeapRegion::is_in_same_region() instead of comparing
HeapRegion* to avoid the table lookup to get the to-HeapRegion* as the
memory load is presumably slower than some arithmetic on two local
But the main driver for that change was that it allows just passing the
is_young() result directly.
hs-tier1-3, several hs-tier1-5 runs with other changes, benchmark suite
More information about the hotspot-gc-dev