[concurrency-interest] RFR: 8065804: JEP 171: Clarifications/corrections for fence intrinsics

Hans Boehm boehm at acm.org
Mon Dec 1 21:51:15 UTC 2014

Needless to say, I would clearly also like to see a simple correspondence.

But this does raise the interesting question of whether put/get and
store(..., memory_order_relaxed)/load(memory_order_relaxed) are intended to
have similar semantics.  I would guess not, in that the former don't
satisfy coherence; accesses to the same variable can be reordered as for
normal variable accesses, while the C++11/C11 variants do provide those
guarantees.  On most, but not all, architectures that's entirely a compiler
issue; the hardware claims to provide that guarantee.

This affects, for example, whether a variable that is only ever incremented
by one thread can appear to another thread to decrease in value.  Or if a
reference set to a non-null value exactly once can appear to change back to
null after appearing non-null.  In my opinion, it makes sense to always
provide coherence for atomics, since the overhead is small, and so are the
odds of getting code relying on non-coherent racing accesses correct.  But
for ordinary variables whose accesses are not intended to race the
trade-offs are very different.


On Mon, Dec 1, 2014 at 12:40 PM, Martin Buchholz <martinrb at google.com>

> Hans,
> (Thanks for your excellent work on C/C++ 11 and your eternal patience)
> On Tue, Nov 25, 2014 at 11:15 AM, Hans Boehm <boehm at acm.org> wrote:
> > It seems to me that a (dubiuously named) loadFence is intended to have
> > essentially the same semantics as the (perhaps slightly less dubiously
> > named) C++ atomic_thread_fence(memory_order_acquire), and a storeFence
> > matches atomic_thread_fence(memory_order_release).  The C++ standard and,
> > even more so, Mark Batty's work have a precise definition of what those
> mean
> > in terms of implied "synchronizes with" relationships.
> >
> > It looks to me like this whole implementation model for volatiles in
> terms
> > of fences is fundamentally doomed, and it probably makes more sense to
> get
> > rid of it rather than spending time on renaming it (though we just did
> the
> > latter in Android to avoid similar confusion about semantics).  It's
> I would also like to see alignment to leverage the technical and
> cultural work done on C11.  I would like to see Unsafe get
> load-acquire and store-release methods and these should be used in
> preference to fences where possible.  I'd like to see the C11 wording
> reused as much as possible.  The meanings of the words "acquire" and
> "release" are now "owned" by the C11 community and we should tag
> along.
> A better API for Unsafe would be
> putOrdered -> storeRelease
> put -> storeRelaxed
> (ordinary volatile write) -> store (default is sequential consistent)
> etc ...
> but the high cost of renaming methods in Unsafe probably makes this a
> no-go, even though Unsafe is not a public API in theory.
> At least the documentation of all the methods should indicate what the
> memory effects and the corresponding C++11 memory model interpretation
> is.
> E.g. Unsafe.compareAndSwap should document the memory effects, i.e.
> sequential consistency.
> Unsafe doesn't currently have a readAcquire method (mirror of
> putOrdered) probably because volatile read is _almost_ the same (but
> not on ppc!).
> > fundamentally incompatible with the way volatiles/atomics are intended
> to be
> > implemented on ARMv8 (and Itanium).  Which I think fundamentally get this
> > much closer to right than traditional fence-based ISAs.
> >
> > I'm no hardware architect, but fundamentally it seems to me that
> >
> > load x
> > acquire_fence
> >
> > imposes a much more stringent constraint than
> >
> > load_acquire x
> >
> > Consider the case in which the load from x is an L1 hit, but a preceding
> > load (from say y) is a long-latency miss.  If we enforce ordering by just
> > waiting for completion of prior operation, the former has to wait for the
> > load from y to complete; while the latter doesn't.  I find it hard to
> > believe that this doesn't leave an appreciable amount of performance on
> the
> > table, at least for some interesting microarchitectures.
> I agree.  Fences should be used rarely.

More information about the core-libs-dev mailing list