RFR (L): 8060025: Object copy time regressions after JDK-8031323 and JDK-8057536
mikael.gerdin at oracle.com
Wed Dec 3 15:15:11 UTC 2014
On 2014-12-03 13:41, Thomas Schatzl wrote:
> Hi all,
> I would like to have reviews for the following change that improves
> object copy time after we noticed performance regressions after the
> changes in JDK-8031323 (alignment of survivor objects) and JDK-8057536
> (context specific allocations).
> In conjunction with JDK-8064473 (Improved handling of age during object
> copy) the changes improve object copy time by ~8% on x64/linux and ~7%
> on SPARC/solaris on SPECjbb2005.
> There are no particular improvements on the scores though as there is
> very little GC work done.
> There seems to be some overall performance gain on CRM Fuse.
> The changes include:
> - merging of the FastCSetTable table with the GCAllocPurpose into a
> table of in_cset_state_t. Each element not only contains information
> about whether the region is humongous or not, but also what generation
> it belongs to if it is in the collection set.
> The encoding has been selected to allow good instruction encoding of
> commonly used checks (e.g. in collection set or not, is humongous).
> GCAllocPurpose has been removed.
> - factor out plab allocation as fast-path for allocation from other
> types of allocations. There have been a few renamings of methods to
> (imo) make the various stages more clear. (i.e. The methods are not all
> called "allocate" any more :))
> - use a per-ParThreadScanState tenuring threshold.
> - only calculate object age if required.
> - some additional direct use of markOop contents instead of accessing
> via the oop (like in JDK-8064473).
> - manually extract some common subexpressions from the code that are not
> obvious to the compiler.
> There is no change in functionality, and the survivor alignment check
> still has some minor performance impact. However imo these changes in
> total outweigh its impact, so further attempts to factor this out (e.g.
> templatizing) do not seem to have a good cost/benefit ratio.
> We may still want to create an RFE that deals with that in a separate
> change. There is enough good change in this change already to warrant
> separate CRs if needed.
> This work is largely based on changes from Tony Printezis at Twitter who
> coincidentally has been working on this issue at the same time, and has
> then been tweaked further (Thanks a lot!). Extensive performance testing
> of many variants (of which this seems to be the best) has been performed
> on internal test systems.
> Tony reported even better improvements on some microbenchmarks on the
> original version of the change.
> As mentioned, unless the application is somewhat GC and object copy
> heavy, there will not be much impact.
Overall I think this is a good change. I'm not particularly fond of the
names InCSetState and in_cset_state_t but I don't really have a good
suggestion to replace it. Perhaps the integral type typedef could be
moved inside InCSetState and be named something like InCSetState::value.
Please, let's try to avoid putting more implementation code than
necessary in g1CollectedHeap.hpp, please put par_allocate_during_gc,
alloc_buffer_stats, desired_plab_sz in g1CollectedHeap.inline.hpp
can desired_plab_sz do
alloc_buffer_stats(dest).desired_plab_sz() instead of duplicating
the switch statement?
check_cset_fast_test should be PRODUCT_RETURN_(true) in the hpp.
you can compare the result of m->decode_pointer() with obj_addr to avoid
the cast to oop. (does this micro-optimization truly influence the evac
failure processing performance?)
class InCSetState does not really seem to belong in g1Allocator.hpp,
have you considered adding it to a new, separate header file?
the class G1InCSetStateFastTestBiasedMappedArray could also be moved to
this new file.
> JPRT, specjbb2005/2013, CRM Fuse
More information about the hotspot-gc-dev