RFR (L): 8060025: Object copy time regressions after JDK-8031323 and JDK-8057536

Thomas Schatzl thomas.schatzl at oracle.com
Thu Dec 11 12:59:34 UTC 2014

Hi all,

On Wed, 2014-12-03 at 16:15 +0100, Mikael Gerdin wrote:
> Hi Thomas,
> On 2014-12-03 13:41, Thomas Schatzl wrote:
> > Hi all,
> >
> >    I would like to have reviews for the following change that improves
> > object copy time after we noticed performance regressions after the
> > changes in JDK-8031323 (alignment of survivor objects) and JDK-8057536
> > (context specific allocations).
> >
> > As mentioned, unless the application is somewhat GC and object copy
> > heavy, there will not be much impact.
> >
> > CR:
> > https://bugs.openjdk.java.net/browse/JDK-8060025
> > Webrev:
> > http://cr.openjdk.java.net/~tschatzl/8060025/webrev.0/
> Overall I think this is a good change. I'm not particularly fond of the 
> names InCSetState and in_cset_state_t but I don't really have a good 
> suggestion to replace it. Perhaps the integral type typedef could be 
> moved inside InCSetState and be named something like InCSetState::value.

There has been the suggestion to merge InCSetState and in_cset_state_t
into one type with a _value member, and use that everywhere.

E.g. something like:

struct InCSetState {
  int8_t _value;
  <make all static methods regular non-virtual methods>

That causes a severe performance regression on SPARC (>10%, worse than
the code we started with). The problem is that the SPARC ABI requires
the compiler to emit code where the struct elements are stored left to
right (from MSB to LSB), which leads into many dependent shift
instructions to extract and store this field.

So I prepared two changes, one that only implements all fixes suggested
by various reviewers here, and a second one that is based on the first
one that implements above suggestion with a twist: for SPARC we define
_value as intptr_t instead of int8_t. This avoids the mentioned shifts
at the cost of some memory. However we want to use int8_t on other
architectures because tests indicated that it is (slightly) faster.

Here are the webrevs:

Full, review changes only:
Diff from version 0:

Full changes merging in_cset_state_t and InCSetState into a single
Diff from version 1a:

I personally like the 1b version better since it does not pollute the
global namespace so much, but I am okay with either version.

Performance-wise, both are equal.

> in g1CollectedHeap.hpp
> Please, let's try to avoid putting more implementation code than 
> necessary in g1CollectedHeap.hpp, please put par_allocate_during_gc, 
> alloc_buffer_stats, desired_plab_sz in g1CollectedHeap.inline.hpp


> can desired_plab_sz do
> alloc_buffer_stats(dest).desired_plab_sz() instead of duplicating
> the switch statement?


> in g1CollectedHeap.cpp
> check_cset_fast_test should be PRODUCT_RETURN_(true) in the hpp.


> in g1EvacFailure.hpp
> you can compare the result of m->decode_pointer() with obj_addr to avoid 
> the cast to oop. (does this micro-optimization truly influence the evac 
> failure processing performance?)


> in g1Allocator.hpp
> class InCSetState does not really seem to belong in g1Allocator.hpp, 
> have you considered adding it to a new, separate header file?
> the class G1InCSetStateFastTestBiasedMappedArray could also be moved to 
> this new file.



More information about the hotspot-gc-dev mailing list