RFR(M): 8080289: Intermediate writes in a loop not eliminated by optimizer

Vitaly Davidovich vitalyd at gmail.com
Thu Jun 18 00:28:25 UTC 2015

So I'm not sure how many cases will arise where scheduling stores is
beneficial (on modern cpus) apart from removing redundant ones.  The
compiler would need some seriously detailed machine model, I think, to
reason about this intelligently.  Removing redundant ones (or moving loop
invariant ones out of loops, like Roland is trying here) seems more
tractable and beneficial? Are there cases beyond this where it would be
profitable? Perhaps scheduling writes to addresses likely to be on same
cacheline maybe ...

As for removing StoreStore barriers, it seems like that's practically
feasible with java's semantics only when EA kicks in; I'm having a hard
time imagining how the JIT can trace unsafe/racy publication reliably and
with minimal overhead.  Perhaps I'm not thinking hard enough though ...

It's almost unfortunate that final fields were granted this right to be
published unsafely :) - would've been perhaps better if explicit fencing
was required for such specialized case.

sent from my phone
On Jun 17, 2015 5:27 PM, "John Rose" <john.r.rose at oracle.com> wrote:

On Jun 17, 2015, at 1:23 PM, Vitaly Davidovich <vitalyd at gmail.com> wrote:

Nope, that's an oversimplified understanding.  One place where the JMM will
> bite you is with publication of object state via final fields. Normal
> stores used to initialize a structure which is published via final-field
> semantics must be ordered to take place before the object is published.  We
> don't (and perhaps can't) track object publication events, nor their
> relation to stores into newly-reachable subgraphs.  Instead, we have fences
> that gently but firmly ensure that data (from normal stores, even to
> non-final fields and array elements!) is posted to memory before any store
> which could be a publishing store for that data.

Not sure what's oversimplified —

I probably misread you, then.

you're describing a JMM semantic for final fields, which I'd expect to be
modeled as barriers in the IR, just like volatile writes would be modeled
as barriers, preventing removal or reordering of them.  I appreciate that
it can be troublesome to track this information, but that only means
compiler will have to play more conservative and there may be some
optimization opportunities lost.  I'd think the pattern would look like:

obj = allocZerodMemory(); // obj has final fields
obj.ctor(); // arbitrarily long/complex CFG
_someRef = obj;

I'd expect redundant stores to be removed as part of ctor() CFG without
violating the storestore barrier.  But, I do understand the
complexity/trickiness of getting this right.

You are correct.  The StoreStore approximates the point at which the object
is first published to other threads.  All normal stores above the
StoreStore can be issued in any order (as far as this fence is concerned)
but must settle before the object is published.  Presumably it is published
shortly after the StoreStore, and the StoreStore could be sunk until that
point, if we wanted to do this, or even eliminated if the object never gets
published.  Also, stores provably unrelated to (unreachable from) the
published object could drop below the StoreStore.  We don't attempt to make
this distinction.  None of these train of thought affects the basic
assertion that (if fences are absent) normal stores can be reordered.

If we wish to remove that StoreStore (for some reason) we would either need
a more precise set of fences (or HB edges), or else we would have to hold
back on aggressive store reordering.  This is what makes me think we may
discover a missing fence, once we start letting those little stores swarm
around each other.

What makes me more nervous about this is the clear fact that non-TSO
platforms (TSO, Itanium) have to tweak their fences in various ad hoc ways
to avoid breaking user code.  See, for example, Parse::do_exits.  If we
make our thread-local orderings more non-TSO-ish, we might run into the
same subtle issues that the PPC port wrestles with.  By "subtle" I partly
mean "relating to unstated user expectations even if not supported by the
JMM", and I also mean "hard to detect, characterize, and fix".

— John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150617/d465bc7c/attachment-0001.html>

More information about the hotspot-compiler-dev mailing list