RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

David Holmes david.holmes at oracle.com
Fri Oct 7 03:23:03 UTC 2016

On 7/10/2016 12:50 PM, Hiroshi H Horii wrote:
> Dear Kim, David, and all,
> Thank you for your comments.
> I created a new webrev. I added memory_order_release as a new enum of
> cmpxchg_memory_order (atomic.hpp) and use it to update forwardees.
> http://cr.openjdk.java.net/~horii/8154736/webrev.04/

I think you intended to modify cmpxchg_pre_membar not 
cmpxchg_post_membar! Release semantics require the "post" fence. Though 
technically release semantics would put the barrier before the store, 
not after. But with no pre-fence you could in theory have a store before 
the cas move inside the cas implementation (on ppc/arm) and get 
reordered with the store performed by the cas.

src/share/vm/gc/parallel/psPromotionManager.cpp still uses 

That aside this seems too reactive to me. Kim may be right that release 
semantics are sufficient for this code, but that is a claim that needs 
some consideration and validation before we just run with it and make 
the change. The approach to changes like this needs a lot more 
discipline and methodology in my opinion.


> Originally, two sync were called before and after cmpxchg in ppc. With
> this change, one of them is reduced. Though one sync still remains,
> performance will be improved.
> Could you give your comments on this new webrev?
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
> Kim Barrett <kim.barrett at oracle.com> wrote on 10/07/2016 07:16:28:
>> From: Kim Barrett <kim.barrett at oracle.com>
>> To: David Holmes <david.holmes at oracle.com>
>> Cc: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-compiler-dev <hotspot-
>> compiler-dev at openjdk.java.net>, Tim Ellison
>> <Tim_Ellison at uk.ibm.com>, "ppc-aix-port-dev at openjdk.java.net" <ppc-
>> aix-port-dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP,
>> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
>> "hotspot-runtime-dev at openjdk.java.net"
> <hotspot-runtime-dev at openjdk.java.net>
>> Date: 10/07/2016 07:17
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>> > On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com>
> wrote:
>> >
>> > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
>> >> Dear David,
>> >>
>> >> Thank you for your comments.
>> >>
>> >> I just used to think that it may be better that copy_to_survivor_space
>> >> doesn't return forwardee if CAS was failed in order to prevent from
>> >> reading fields in forwardee. But as you pointed, this extends fix for
>> >> this topic.
>> >>
>> >> I removed two NULL assignments from the previous wevrev.
>> >> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
>> >
>> > Which simply takes us back to where we were. It may not be safe
>> for the caller of those methods to access the fields of the returned
>> "forwardee".
>> >
>> > Sorry but I'm not seeing anything here that justifies removing the
>> barriers from the cas in this code. GC lurkers feel free to jump in
>> here - this is your code afterall! ;-)
>> >
>> > David
>> > -----
>> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
>> to me to be extremely fragile and hard to reason about.  The places
>> where that copied object might escape to and be examined seem to be
>> myriad.  And not only do we need to worry about them today, but also
>> for future maintenance.  Even if it can modified and shown to be
>> correct today, it would be very easy to intoduce a bug later, as
>> should be obvious from the various issues pointed out so far during
>> this review.
>> The key issue here is that we copy obj into new_obj, and then make
>> new_obj accessible to other threads via the CAS.  Those other threads
>> might attempt to access data in new_obj.  This suggests the CAS ought
>> to have at least a release fence to ensure the copy is complete before
>> the CAS is performed.  No amount of fencing on the read side (such as
>> in the work stealing) can remove that need.
>> And that might be all that is needed.  On the post-CAS side, we load
>> the forwardee and then load values from it.  I thik we can use
>> implicit consume with dependent loads (except on Alpha) plus the
>> suggested release fence to get the desired effect.  (If not, use an
>> acquire form of forwardee()?)
>> I'm not certain that just a release fence is sufficient (I'm less
>> familiar with ParallelGC than I'd like for looking at something like
>> this), but I'm pretty sure I wouldn't want to go any weaker than that.

More information about the hotspot-gc-dev mailing list