RFR (L): 8060025: Object copy time regressions after JDK-8031323 and JDK-8057536
jon.masamitsu at oracle.com
Thu Dec 4 03:40:33 UTC 2014
G1ParScanThreadState::allocate_in_next_plab() basically tries to
allocate from InCSetState::Old and changes some state variables
(_tenuring_threshold and parameter dest). I'm not sure it's worth
having a method to do that. Would it make it simpler to
understand if copy_to_survivor_space() called
directly and the resetting of _tenuring_threshold and parameter dest
were set in handle_evacuation_failure()?
Also maybe allocate_in_next_plab() gets in-lined but if it doesn't
it's a function call that could be saved.
The old code used a HeapWord* obj
117 HeapWord* obj = NULL;
that it returns at t he end of the method. Is the change you made to not
use the "obj" a style change? Or was better performance expected? Style
reason is fine. I just want to know the reason.
Rest looks good.
On 12/3/2014 4:41 AM, Thomas Schatzl wrote:
> Hi all,
> I would like to have reviews for the following change that improves
> object copy time after we noticed performance regressions after the
> changes in JDK-8031323 (alignment of survivor objects) and JDK-8057536
> (context specific allocations).
> In conjunction with JDK-8064473 (Improved handling of age during object
> copy) the changes improve object copy time by ~8% on x64/linux and ~7%
> on SPARC/solaris on SPECjbb2005.
> There are no particular improvements on the scores though as there is
> very little GC work done.
> There seems to be some overall performance gain on CRM Fuse.
> The changes include:
> - merging of the FastCSetTable table with the GCAllocPurpose into a
> table of in_cset_state_t. Each element not only contains information
> about whether the region is humongous or not, but also what generation
> it belongs to if it is in the collection set.
> The encoding has been selected to allow good instruction encoding of
> commonly used checks (e.g. in collection set or not, is humongous).
> GCAllocPurpose has been removed.
> - factor out plab allocation as fast-path for allocation from other
> types of allocations. There have been a few renamings of methods to
> (imo) make the various stages more clear. (i.e. The methods are not all
> called "allocate" any more :))
> - use a per-ParThreadScanState tenuring threshold.
> - only calculate object age if required.
> - some additional direct use of markOop contents instead of accessing
> via the oop (like in JDK-8064473).
> - manually extract some common subexpressions from the code that are not
> obvious to the compiler.
> There is no change in functionality, and the survivor alignment check
> still has some minor performance impact. However imo these changes in
> total outweigh its impact, so further attempts to factor this out (e.g.
> templatizing) do not seem to have a good cost/benefit ratio.
> We may still want to create an RFE that deals with that in a separate
> change. There is enough good change in this change already to warrant
> separate CRs if needed.
> This work is largely based on changes from Tony Printezis at Twitter who
> coincidentally has been working on this issue at the same time, and has
> then been tweaked further (Thanks a lot!). Extensive performance testing
> of many variants (of which this seems to be the best) has been performed
> on internal test systems.
> Tony reported even better improvements on some microbenchmarks on the
> original version of the change.
> As mentioned, unless the application is somewhat GC and object copy
> heavy, there will not be much impact.
> JPRT, specjbb2005/2013, CRM Fuse
More information about the hotspot-gc-dev