RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
martin.doerr at sap.com
Fri Oct 21 12:57:42 UTC 2016
thank you very much for reviewing. I fully agree with the latest replies.
I think Hiroshi's latest webrev (http://cr.openjdk.java.net/~horii/8154736/webrev.05/) is pretty close to it.
There are only still acquire barriers which could be replaced by a comment like "We rely on memory_order_consume here.".
I'd prefer this, too, even though acquire barriers in failure cases would probably not really hurt.
Cmpxchg Release,Relaxed + Load Consume seems to be the pattern which matches the needs exactly.
The webrev also contains a logging change in psPromotionManager.inline.hpp which I'm not sure if it's still wanted.
Not sure if aarch64 should be addressed in a separate change.
Besides that, it looks good to me.
From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley
Sent: Dienstag, 11. Oktober 2016 11:26
To: Kim Barrett; David Holmes
Cc: hotspot-compiler-dev; Hiroshi H Horii; Tim Ellison; ppc-aix-port-dev at openjdk.java.net; Michihiro Horie; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64
On 06/10/16 23:16, Kim Barrett wrote:
> The key issue here is that we copy obj into new_obj, and then make
> new_obj accessible to other threads via the CAS. Those other
> threads might attempt to access data in new_obj. This suggests the
> CAS ought to have at least a release fence to ensure the copy is
> complete before the CAS is performed. No amount of fencing on the
> read side (such as in the work stealing) can remove that need.
> And that might be all that is needed. On the post-CAS side, we load
> the forwardee and then load values from it. I thik we can use
> implicit consume with dependent loads (except on Alpha) plus the
> suggested release fence to get the desired effect.
That's probably true, except that there's not really any such thing as
"implicit consume" in C++. While all of the hardware we use respects
address dependencies, it's not something that the compiler knows
about, and it's explicitly undefined behaviour in the C++ memory
model. If we're depending on memory_order_consume, perhaps we ought
to think about adding it to Atomic, even though it's just a volatile
load in older compilers.
More information about the hotspot-gc-dev