Replace MemBarRelease for final field write with MemBarStoreStore

Hui Shi hui.shi at
Thu Sep 3 15:21:12 UTC 2015

Thanks Andrew, Vitaly, Aleksey and Doug!

I can understand we need extremely careful when weakening memory barrier
and I have read Aleksey and Hans blog when google why it needs loadstore
memory barrier. Following are my understanding about 3 examples, the last
one is the real corner case need loadstore barrier I get after all

First example is in,
doesn't apply to final field store.
1. variable is not final and written 3 times.
2. storestore fence only blocks first write but not blocks second a++
write? Is there real hardware support this semantic?
3. In statement x.a++, load of x.a is completed after store to x.a.

Second example in is
more reasonable and shows storetore fence is not enough, but this case is
not final field store. Final field doesn't have store race (except
reflection write), it only need update once and ensure all threads see same
value when loading final field.

Thread 1:
x = ... ? 1 : 2;
assert (x < 3);
x_init = true;

Thread 2:
if (x_init) {
    x = 17;

Last is Aleksey's example, ta.x only written once.
Loadstore fence is needed based on possibility "committing the store to
ta.x before the load of G is fully satisfied".
checking, above re-ordering
can happen with compilation optimization that breaks dependency between
load G and store to ta.x. Checking my modified Thread 1 code, this is the
corner case needs loadstore memory barrier.

static int G = 0;
A a;
class A {
  int x;

Thread 1:
 A ta = <new>
 ta.x = G;
 assert (ta.x == 0); // may fail, captured a racy update
 a = ta; // publish

Thread 2:
 A ta = a;
 if (ta != null) {
    G = 42; // NOT a dependent access, bummer
    assert(ta.x == 0); // may fail, value got contaminated

 Modified Thread 1
 A ta = <new>
 ta.x = 0;   // if profile G value is zero  and break dependency
 if (G != 0) // load after Thread2 update G
    ta.x = G;
 assert (ta.x == 0); // may fail, captured a racy update
 a = ta; // publish

Breaking dependency between final stores and loads it depends on introduce
loadstore barrier. To solve similar corner cases, JIT need ensure no final
stores's depended load is reordered with store in JIT optimization. Then
JIT can weakening Memory Barrier here. This has some chance to missing
optimization for final store, but is worthwhile on RMO platform like

Shi Hui

On 2 September 2015 at 23:32, Vitaly Davidovich <vitalyd at> wrote:

> I can certainly understand the decision to play conservative here in light
> of this being a dark corner.  But on the other hand, one can argue that
> being conservative here "encourages" (or at least doesn't slap on the
> wrist) people doing this.  On top of that, it seems Hui identified this as
> a performance problem on some impl of aarch64 (personally, I don't have a
> horse in this race since the archs I use nop the LoadStore|StoreStore
> combo); a 40% perf boost is nothing to sneeze at, IMHO.
> Having said that, could you please modify the JMM cookbook to stipulate
> this for final fields in the meantime?
> On Wed, Sep 2, 2015 at 11:21 AM, Doug Lea <dl at> wrote:
>> On 09/02/2015 11:08 AM, Vitaly Davidovich wrote:
>>> I'm not sure "accommodating" capturing racy memory is required
>>> here.
>>> Hans' actual example isn't applicable to final/stable fields because
>>> they cannot
>>> be mutated after assignment.  You introduced a racy read in your
>>> example, and I
>>> don't know if that's valid.
>> The existing JSR133 JMM specs and final field rules do not provide an
>> answer.
>> And the JMM will not be updated for jdk9. For VarHandle usages, we
>> can specify what happens when you use them (thus resolving the issue
>> in those cases). But for other usages, the answer is murky, and
>> most people think that reducing surprise is the best decision.
>> -Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the hotspot-compiler-dev mailing list