RFR: JDK-8061259: ParNew promotion failed is serialized on a lock

Kim Barrett kim.barrett at oracle.com
Thu Nov 6 01:50:30 UTC 2014

On Oct 30, 2014, at 1:24 PM, Jungwoo Ha <jwha at google.com> wrote:
> http://cr.openjdk.java.net/~rasbold/8061259/webrev.02/


271   // Support for CMSFastPromotionFailure
272   reset_promotion_failed();

Why this and not just initialize the data member in the constructor
initializer list?


[All of code discussed here is in par_promote().]

The tests of CMSFastPromotionFailure && has_promotion_failed() that
are outside the lock have a sort of DCLP (double checked locking
pattern) feel to them.  The goal here is to avoid the locked region
and go do something else, instead of waiting for the current lock
holder to complete the work the checker would be doing if only they
could get the lock.

I think I would like it better if the outside the lock tests were
moved closer to the lock entries, to emphasize that relationship, and
to reduce the code affected by the introduction of this set of
changes. That might result in the thread that gives up doing a little
more work before giving up.  I'm particularly thinking of

1358   if (CMSFastPromotionFailure && has_promotion_failed()) {
1359     return NULL;
1360   }

removal of which would let execution proceed to 

1373     if (!expand_and_ensure_spooling_space(promoInfo)) {
1374       return NULL;

and give up because the moved test caused expand_xxx to return false.
I don't know if the placement of the test at line 1358 rather than
inside expand_and_ensure_spooling_space() would be significant.  I
suspect not, but without measurements I don't have a lot of confidence
in that supposition.  But I think that change would improve the
understandability of the code.

In the other case, 

1381      if (CMSFastPromotionFailure && has_promotion_failed()) {
1382        return NULL;
1383      }
1384      obj_ptr = expand_and_par_lab_allocate(ps, alloc_sz);

I would definitely prefer that test be moved into
expand_and_par_lab_allocate().  This is the only call to that
function, and the first thing it does is lock the mutex and re-perform
the test.


I'm suspicious of the during-review change to make
_has_promotion_failed non-atomic.

I see the argument for the interaction between set and test; the set
occurs in lock context so gets released on the way out of the lock,
while the first time each core gets a stale false value it will
acquire the lock and acquire the updated set value, so that future
tests on the same core will avoid the code in question, until the
reset finally happens.

However, I think the present interaction between reset and test is not
safe. A test seeing a stale true value leads to an inappropriate skip
of the protected code, with nothing obvious to prevent that from
occurring "forever".


More information about the hotspot-gc-dev mailing list