RFR: 7143664: Clean up OrderAccess implementations and usage

David Holmes david.holmes at oracle.com
Thu Feb 12 01:38:30 UTC 2015

Hi Erik,

On 12/02/2015 12:32 AM, Erik Österlund wrote:
> Hi David,
> Thanks a lot for reviewing my changes!
> Sorry for the delay, had a few other things I had to take care of.

No problem, we're still waiting for others to chime in here.

A few comments inlined below ...

> On 09/02/15 07:14, David Holmes wrote:
>> On 29/01/2015 8:48 AM, David Holmes wrote:
>>> Updated webrev:
>>> http://cr.openjdk.java.net/~dholmes/7143664/webrev.v2/
>> First let me say that Erik and I had a lot of offline discussions about
>> this since he first proposed his reworking. My initial concern was
>> ensuring the model changes were appropriate as this is a complex area
>> and there can be different views of how things can/should be expressed.
>> As per the bug report I was aware of a number of issues with the
>> existing expression of semantics, but Erik showed me things were even
>> worse than I had thought :)
> :)
>> src/share/vm/runtime/orderAccess.hpp
>> Minor nit: I would prefer the term "bound" rather than "joined" when
>> referring to load_acquire and store_release.
> Fixed.
>> In the implementation table I think "sparc" should be "sparc TSO".
> Fixed.
>> I still think we need to say something about C++ volatile and about
>> compiler barriers in general.
> How about something like:
> C++ volatile semantics prevent compiler re-ordering between volatile
> memory accesses. However, reordering between non-volatile and volatile
> memory accesses is in general undefined. For compiler reordering
> constraints taking non-volatile memory accesses into consideration, a
> compiler barrier has to be used instead. Note also that both volatile
> semantics and compiler barrier do not prevent hardware reordering.

Sounds great! Though perhaps add "Some compiler implementations may 
choose to enforce additional constraints beyond those required by the 
language." ( a reference to MVSC)

>> src/os_cpu/bsd_x86/vm/orderAccess_bsd_x86.inline.hpp
>> src/os_cpu/linux_x86/vm/orderAccess_linux_x86.inline.hpp
>> In OrderAccess::fence doesn't the existing asm already act as a compiler
>> barrier; and on uniprocessor we need neither compiler nor hardware
>> barriers; so isn't the compiler_barrier() redundant?
> I saw this was already clarified. But yeah, even uniprocessor systems
> have concurrency. Therefore I added compiler barriers to properly
> protect even uniprocessor systems, that could previously suffer from
> data races involving unlikely yet possible compiler reorderings and
> context switches.

Yeah that was brain fart on my part ;-)

>> ---
>> src/os_cpu/bsd_zero/vm/orderAccess_bsd_zero.inline.hpp
>> src/os_cpu/linux_zero/vm/orderAccess_linux_zero.inline.hpp
>> No comments - The Zero folk will need to approve this change in terminology.
> I didn't like talking about "read" and "write" barriers as it says
> little about what it actually means. And I also didn't like defining
> things like storestore/storeload etc. in terms of acquire/release and
> read/write barriers because I think it's weird. ;) That's why I changed
> this. But if the previous terminology is preferred, I'm fine with
> changing that.

I prefer what you have too :) But this isn't my code so the Zero folk 
(who?) need to give this their blessing (or else we leave it out and let 
them incorporate this later).

Ditto we need the ppc64 folk (Volker, Goetz) to also give their blessing.

>> src/os_cpu/solaris_sparc/vm/orderAccess_solaris_sparc.inline.hpp
>> src/os_cpu/solaris_sparc/vm/solaris_sparc.il
>> You've removed the GNU_SOURCE ifdefs which means the new code has to be
>> tested using gcc on Solaris. We don't officially support that platform
>> so I have no way to verify these particular changes.
> Previously solaris studio did not support inline asm syntax. So the
> solaris studio variant used .il assembly instead while the GNU_SOURCE
> variant used inline asm.
> But since this is nowadays supported by solaris studio too, I joined
> them into one. Not only because it's nicer to unify, but because I need
> the compiler barrier constraints that inline asm can give us. I don't
> know the compiler reordering constraints are guaranteed when calling
> external assembly. So better make it explicit and safe.

Totally fine with making use of Solaris Studio's modern capabilities, 
only minor concern is no way to test gcc on Solaris. But unless someone 
else screams about this I'm happy to assume that gcc will still be okay.

>> src/os_cpu/solaris_x86/vm/orderAccess_solaris_x86.inline.hpp
>> src/os_cpu/solaris_x86/vm/solaris_x86_32.il
>> src/os_cpu/solaris_x86/vm/solaris_x86_64.il
>> I think it makes sense to use the same code here as the linux x86
>> version. But as above can't verify this for gcc on Solaris; also query
>> about need for compiler_barrier() in fence().
> Same goes here as above. Solaris studio used to not support inline asm.
> So I joined the variants relevant for correctness fixes. But of course,
> it would be possible to add the optimizations from x86 bsd/linux as
> well. But I wanted to keep optimizations out of this changeset as I
> think it is already a lot to review without the addition of new
> optimizations. So if you don't mind, I'd like to defer that until the
> correctness fixes are in place.


>> Do you have a reference to the MSVC volatile semantics?
> https://msdn.microsoft.com/en-us/library/12a04hfd(v=vs.100).aspx
> Release:
> "A write to a volatile object (volatile write) has Release semantics; a
> reference to a global or static object that occurs before a write to a
> volatile object in the instruction sequence will occur before that
> volatile write in the compiled binary."
> Acquire:
> "A read of a volatile object (volatile read) has Acquire semantics; a
> reference to a global or static object that occurs after a read of
> volatile memory in the instruction sequence will occur after that
> volatile read in the compiled binary."
> Should fit our model right for compiler reordering.

Yes. Though I was amused to read the comments on that page. :)

I mainly wanted to check this was guaranteed by the versions of MSVC we 
are currently using - which it is.

>> Not clear about the 32-bit versus 64-bit changes - mainly because I
>> don't know why they were different in the first place.
> Seems like x86_64 didn't have good support for inline asm syntax at some
> point. I don't know if this is still true today. It would be nice to
> know because I have seen fishy x86_64 implementations of a lot of atomic
> stuff as well, assuming lack of compiler support that might actually be
> available today.

Yeah something to follow up with making the x86 implementations uniform 
across all OS/compilers.

>> Why are windows release_store_fence specializations different to linux
>> x86 versions?
> How do you mean they are different? To me they look quite equivalent.
> xchg is always implicitly locked on x86, and they both use xchg to get
> the wanted fence semantics.


   89 template<>
   90 inline void OrderAccess::specialized_release_store_fence<jint> 
(volatile jint*   p, jint   v) {
   91   __asm {
   92     mov edx, p;
   93     mov eax, v;
   94     xchg eax, dword ptr [edx];
   95   }


! template<>
! inline void OrderAccess::specialized_release_store_fence<jint> 
(volatile jint*   p, jint   v) {
     __asm__ volatile (  "xchgl (%2),%0"
                       : "=r" (v)
                       : "0" (v), "r" (p)
                       : "memory");

but I'm guessing the MVSC inline assembler doesn't have the same 
capabilities as gcc hence we have to write the code to load the 
registers needed by the xchg.

>> Not clear about your change to the float version of release_store_fence:
>> a) do we need to change it?
>> b) given what we changed it to isn't that the default ?
> I didn't want to have default specializations of jfloat/jdouble
> equalling jint/jlong. I thought maybe some architecture might want to
> put it in different registers or something, making a default conversion
> to int/long maybe undesirable. So the default for jfloat/jdouble is to
> use acquire()/release() paired with an atomic memory access. So you have
> to explicitly specialize them to be the same as jint/jlong if wanted,
> which I believe is what is done in the code you refer to.

Okay. It was the difference between the jfloat and jdouble versions in 
the original code that threw me. But the jdouble needs to delegate so we 
end up doing an Atomic::load.

> If you think I'm paranoid about floats, I can of course make it a
> default behaviour to use jint/jlong even if it isn't explicit, in the
> same way I handled unsigned variants.

No I think that is okay.

> Rule of thumb: The variants that need specialization are the ones that
> have overrides for OrderAccess::store/OrderAccess::load, the rest is
> handled automatically. Currently jdouble and jfloat are part of that and
> hence need explicit specialization, but if we want to generalize floats
> too, that's fine by me. So is it preferred to generalize jfloat/jdouble
> to equal jint/jlong?

I'm okay as-is.

>> General comment: copyright dates need updating in most, if not all, files.
> Fixed.
> Do we want a new webrev? (changed copyright headers and few words in
> comments)

If you can send me an incremental patch I will update the webrev.

> Thanks a lot for taking time to review these changes!

Thanks for contributing them! :)


> /Erik

More information about the hotspot-dev mailing list