OrderAccess Refactoring

Paul Hohensee paul.hohensee at gmail.com
Fri Jan 9 22:39:06 UTC 2015

Hi Erik,

Being the person who put OrderAccess into its present form :), here's some

First, I'm happy that someone's looking at this.  I wanted to make
available both barriers and acquire/release functionality, and also provide
common composition methods.  In retrospect, I didn't separate the two
concepts well enough in the interface (the work was done as part of an ia64
port, so I was kinda fixated on ia64 concepts at the time), so that's
definitely something that should be fixed.

We understood that volatile semantics wouldn't be enough by themselves to
implement the semantics we wanted ordering on most platforms, but they
*are* necessary to prevent compilers (not the hardware though) from
reordering volatile accesses with respect to each other.  Whether or not
the volatile keyword prevents non-volatile memory accesses from being
reordered wrt volatile ones seemed a bit fuzzy in the C++ standard at the
time (and still does, at least to me), but we figured we could handle that
in the os_cpu files by checking which compiler was being used.  The code
that implemented the *hardware* semantics also went into the os_cpu
file(s).  That's why there's some seemingly silly code there, including
assembly that executes what amount to nops.

I initially wrote the code in factored form, but changed to the current
scheme because I found it difficult to see all the code for a single
platform in one place (I wasn't using an IDE :) ).  In retrospect, an
unwise decision due to the usual code duplication problems.  Mea culpa. :)

Looking forward to your webrev.




Hi Karen,

Thanks for the reply!

On 09 Jan 2015, at 15:18, Karen Kinnear
<karen.kinnear at oracle.com<
mailto:karen.kinnear at oracle.com>> wrote:

> Erik,
> I am delighted to have you looking at this code in detail to make sure it
is accurate and maintainable.


> In theory I like refactoring. In practice I hope there is a very clear
way to specify the per-platform assumptions and to make it very obvious
what the end results are
> for a given platform and API.

For each platform you only have to specify two things:

1. What is needed for acquire, release and fence for memory accesses,
for that platform (5 LoC per platform). The pattern of when to use them
(semantics) is generalized in shared code.

2. To provide any specializations for optimization purposes when
applicable (certain TSO platforms that can join the access and semantics
to a single inline asm instruction). Note that this is optional, and
only done to get faster accesses. Many platforms do not do it at all.

I think it?s very intuitive! Will provide webrev if this seems worthwhile.

> Having code reviewed (and obviously missed things) earlier rounds of
OrderAccess - I just wanted to note that we never thought that C++ volatile
would provide
> acquire/release semantics. volatile has always been just a compiler
directive, and yes volatile-volatile only. Acquire/release are instructions
to the hardware - not
> what I would call compiler barriers.

Acquire/release are semantics, not necessarily bound to neither
compiler/hardware. It gives a certain contract to the user that expects
these acquire/release semantics. Compiler barriers and fences is merely
a means of providing these semantics, but the user should not have to
care about such details.

If what you are saying is that it was intended only as acquire/release
for hardware but not constrain compiler reorderings to comply with the
semantics, then the contract is broken (does in fact not promise acquire
release, only on hardware level but compilers could do whatever with
non-volatiles which is equally as bad).

A store release should synchronize with an acquire load to the same
location, like in this example:

T1: store x1
T1: release store x2

T2: load acquire x2
T2: load x1

The semantics should guarantee that if load acquire x2 happens after
store release x2 in total order, then it is guaranteed that store x1
will be observed by the load x1, and not some old value that was there
before store x1. This contract does not currently hold since the normal
non-volatile memory accesses may be reordered with volatile memory
accesses; the stores and loads may be reordered in this example by the
compiler at will.

Therefore acquire release semantics are not guaranteed in general, but
only w.r.t. volatiles. This is neither clearly documented nor used in
such a way.

OrderedAccess is used in a way that assumes acquire/release semantics.
But for certain TSO platforms, the contract only provides volatile
semantics, not acquire release semantics. On other platforms, it
properly provides acquire release semantics, which is expected.

It only makes sense to by contract promise acquire/release semantics
(both for compiler and hardware reorderings) for uses of these methods,
and consistently do so on all platforms.

> Personally I would do the correctness fixes first - and make sure they
are really well tested and carefully code reviewed studying the cookbook
and memory model.
> I have not been following the more recent java memory model evolution - I
believe Aleksey Snipilev has - if you have references to any updated advice
> from Doug Lea - that would be very helpful for your reviewers to study. I
know the last time I reviewed this I had to make my own small charts and
walk it
> through based on per-platform guarantees.

There is nothing wrong with the Java memory model. The only problem is
that OrderAccess promises semantics that are only guaranteed on some

> I would find it helpful if you are aware of specific bugs - if you could
list the actual problems and call those out? It is very hard to get this
right and
> hard to review - so if you were to have targeted reviews for specific
fixes we could give the changes the depth of study that they need.

In for instance https://bugs.openjdk.java.net/browse/JDK-8061964 this
was observed and caused trouble for G1 card refinement that went boom.
In the discussion I pointed out the issue of relying on volatiles as a
cause for the crash, and it was fixed, but not consistently for all
platforms, but only for gcc on linux where the reordering was observed
in the generated asm output. But it is equally as dangerous on other
(TSO) platforms (except windows where the compiler actually explicitly
promises not to reorder - their volatiles follow acquire/release in
compiler reorderings). I think hoping for for compilers to not reorder
without any guarantees, waiting for things to go boom where it has
already been observed on other platforms seems dangerous. Better safe
than sorry. :)

One different solution is to redocument the contract clearly as being
acquire/release w.r.t. volatile accesses, which is currently the actual
contract provided by the implementation.
But since bugs were already discovered (which is very understandable
because no other atomic library has such semantics that I know of) and
its code is still unchanged, it seems dangerous to do so without
re-evaluating the use of OrderedAccess in hotspot, which seems a bit
painful. Also such a revised contract would be a disaster when you need
to copy loads of data and then store release for instance. This would
force the copying (of potentially lots of data) to either be done using
volatile accesses rather than in an accelerated fashion (e.g. SSE
extensions, write combining), or add compiler barriers to such copying
to give the release store actual release store semantics. This would
also require fixing G1 where a bug was observed to not rely on the
intuitive and expected contract, but on an unintuitive and provably
error prone contract. Ugh!!

The only solution that seems to make sense to me is to fix the semantics
of OrderedAccess (that were already changed on Linux) to be what is
expected, and that is acquire/release w.r.t. other accesses (volatile or

> Doing the refactoring afterwards allows us the easier exercise of
ensuring that the refactoring produces the same code that we already have
> in depth and believe to be correct.
> So as you refactor please do find a place to clarify the per-platform
assumptions so if we realize our assumptions are wrong we will re-review
the code.

I was hoping to do both at the same time, since there are many platforms
to be changed, and making them over and over for every type of access on
every platform to be changed is a bit tedious (and error prone) compared
to doing it once per platform, reusing the shared pattern (to be
reviewed once instead of once per platform) and review the choice of
barriers once for each platform instead of for every memory access on
every platform. Of course I don?t mind if we choose to do things in the
other order though if that is preferred. :)

If you compare the change in semantics I propose, without thinking about
the refactoring, just compare x86 on bsd and linux, you see that on
linux a compiler barrier is correctly used, but not on bsd. This is the
change that needs to be added to all TSO platforms except linux and
windows to ensure correctness.

thank you for offering and for asking for opinions :-)

Thank you for the good feedback! It looks like this change seems wanted,
so I?ll go ahead and prepare a webrev with the proposed changes. :)

More information about the hotspot-dev mailing list