From dl at cs.oswego.edu Mon Aug 10 18:22:05 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Mon, 10 Aug 2015 14:22:05 -0400
Subject: [jmm-dev] jdk9 APIs
Message-ID: <55C8EBCD.7010006@cs.oswego.edu>
It's been a while...
As everyone has surely noticed, progress on revising the core Java
Memory Model stalled after discovering that some new ideas are needed
to deal with out-of-thin-air and related issues in both Java and
C/C++. Some people are pursuing promising approaches, but there is
little chance that a full reformulation will be ready before jdk9 is
released (about a year from now).
However, despite this, jdk9 will include access and fence APIs that
need specs. This is non-negotiable, because the current weird way of
accessing the (mostly existing) underlying JVM functionality via
sun.misc.Unsafe is going away. (Actually, planned to only partly go
away in jdk9, but also introducing classes/methods allowing future
decomission.)
For accesses (mostly) corresponding to C11 atomics with memory_order,
the alternatives reside in "VarHandles". These are more-or-less
similar to AtomicXFieldUpdaters, but are designed to support
generation of much better code (without dynamic/reflective baggage),
generally as good as handcrafting, at least after warmup. The setup
for them is unusual -- relying on just-in-time specialization of
generic wrappers. This is a scale-down in ambition compared to the
original "enhanced volatiles" proposal that would have required
language syntax changes that met opposition.
Plans are to also include a stand-alone Fences class with all-static
methods. (Deja vu for many of us.) Plus to separately add the
reachabilityFence/keepAlive method discussed last year as a static
method most likely in class java.lang.ref.Reference.
There seems to be only a little room for discussion on the exact sets
of methods in the VarHandle* and Fences classes:
Current versions of VarHandle do NOT include an analog of C11
"consume" mode. As discussed last year, considering its uncertain fate
in C/C++, a better tactic might be to support a method of the form
"getDependently(ref)" (sorta like in the linux kernel) that covers its
{only? main?} use case. But for now omitted.
For Fences, adding plain StoreStore and LoadLoad fences to the set
acquireFence, ReleaseFence, and fullFence seems wise. The main
arguments against LoadLoad and StoreStore in C11 were usability
concerns. But they have been found to be useful enough internally to
JVMs (mainly on ARM) to be semi-supported within hotspot. And we'd
like to not keep making the same mistake of unnecessarily making
things ARM/POWER hostile.
But the main issue at hand is how we can provide library specs for the
APIs without revising the underlying Java Memory Model.
My proposal is that we muddle through. Doing so seems surprisingly
workable: Moded accesses and fences allow programmers to rule out some
behaviors. Specifying them need not spell out underlying rules that
hold when these methods are not used. And on the other side, they need
not promise any overall property (as in: it might be the case that
using a fullFence between every access gives you SC, but we don't need
to guarantee it.) It is in a sense cheating to convey this in method
specs by using terms that are not fully backed by a formal underlying
model. But they still can be made good enough for readers to
understand intent, and made more rigorous someday. We can and should
also comfort programmers that, when applicable, the primary effects
of these methods are compatible with C/C++. Given all this, the
specs can be pretty simple. They are done out for Fences below, and
if OK should not be hard to apply to VarHandle methods.
Other ideas are of course welcome.
Pasted below are preliminary versions. If they don't format
nicely in your mail reader, get them at
http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java
/**
* A set of methods providing fine-grained control of memory ordering.
*
*
The Java Language Specification permits operations to be
* executed in orders different than are apparent in program source
* code, subject to constraints mainly stemming from the use of locks
* and volatile fields. The methods of this class can also be used to
* impose constraints. Their specifications are phrased in terms of
* the lack of "reorderings" -- observable ordering effects that might
* otherwise occur if the fence were not present.
*
* @apiNote More precise phrasing of these specifications may
* accompany future updates of the Java Language Specification.
*/
public class Fences {
/**
* Ensures that loads and stores before the fence will not be
* reordered with loads and stores after the fence.
*
* @apiNote Ignoring the many semantics differences from C and
* C++, this method has memory ordering effects compatible with
* atomic_thread_fence(memory_order_seq_cst)
*/
public static void fullFence() {}
/**
* Ensures that loads before the fence will not be reordered with
* loads and stores after the fence.
*
* @apiNote Ignoring the many semantics differences from C and
* C++, this method has memory ordering effects compatible with
* atomic_thread_fence(memory_order_acquire)
*/
public static void acquireFence() {}
/**
* Ensures that loads and stores before the fence will not be
* reordered with stores after the fence.
*
* @apiNote Ignoring the many semantics differences from C and
* C++, this method has memory ordering effects compatible with
* atomic_thread_fence(memory_order_release)
*/
public static void releaseFence() {}
/**
* Ensures that loads before the fence will not be reordered with
* loads after the fence.
*/
public static void loadLoadFence() {}
/**
* Ensures that stores before the fence will not be reordered with
* stores after the fence.
*/
public static void storeStoreFence() {}
}
class sample VHUsages {
int aField;
static Varhandle AFIELD = ...;
void usage1() {
int aFieldValue = AFIELD.getVolatile(this);
}
}
/**
* Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle
*/
abstract class NotReallyVarHandle {
// Load
T getRelaxed(Object owner);
T getAcquire(Object owner);
T getVolatile(Object owner);
// tbd: Consume analog
// Store
void setRelaxed(Object owner, T val);
void setRelease(Object owner, T val);
void setVolatile(Object owner, T val);
// CAS
boolean compareAndSet(Object owner, T cmp, T val);
boolean compareAndSetAcquire(Object owner, T cmp, T val);
boolean compareAndSetRelease(Object owner, T cmp, T val);
boolean weakCompareAndSet(Object owner, T cmp, T val);
boolean weakCompareAndSetAcquire(Object owner, T cmp, T val);
boolean weakCompareAndSetRelease(Object owner, T cmp, T val);
// special RMW
T getAndSet(Object owner, T val);
T getAndAdd(Object owner, T delta);
T addAndGet(Object owner, T delta);
}
class java.lang.ref.Reference {
// add:
/**
* Ensures that the object referenced by the given reference
* remains strongly reachable (as defined in the {@link
* java.lang.ref} package documentation), regardless of any prior
* actions of the program that might otherwise cause the object to
* become unreachable; thus, the referenced object is not
* reclaimable by garbage collection at least until after the
* invocation of this method. Invocation of this method does not
* itself initiate garbage collection or finalization.
*
* @param ref the reference. If null, this method has no effect.
*/
public static void reachabilityFence(Object ref) {}
}
From jeremymanson at google.com Tue Aug 11 18:01:58 2015
From: jeremymanson at google.com (Jeremy Manson)
Date: Tue, 11 Aug 2015 11:01:58 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
Message-ID:
Just curious: which end users are jonesing for a Fences API? I talk with a
LOT of Java users, and it's *never* come up outside of the kinds of people
on this list.
What's the urgency now?
Jeremy
On Mon, Aug 10, 2015 at 11:22 AM, Doug Lea wrote:
> It's been a while...
>
> As everyone has surely noticed, progress on revising the core Java
> Memory Model stalled after discovering that some new ideas are needed
> to deal with out-of-thin-air and related issues in both Java and
> C/C++. Some people are pursuing promising approaches, but there is
> little chance that a full reformulation will be ready before jdk9 is
> released (about a year from now).
>
> However, despite this, jdk9 will include access and fence APIs that
> need specs. This is non-negotiable, because the current weird way of
> accessing the (mostly existing) underlying JVM functionality via
> sun.misc.Unsafe is going away. (Actually, planned to only partly go
> away in jdk9, but also introducing classes/methods allowing future
> decomission.)
>
> For accesses (mostly) corresponding to C11 atomics with memory_order,
> the alternatives reside in "VarHandles". These are more-or-less
> similar to AtomicXFieldUpdaters, but are designed to support
> generation of much better code (without dynamic/reflective baggage),
> generally as good as handcrafting, at least after warmup. The setup
> for them is unusual -- relying on just-in-time specialization of
> generic wrappers. This is a scale-down in ambition compared to the
> original "enhanced volatiles" proposal that would have required
> language syntax changes that met opposition.
>
> Plans are to also include a stand-alone Fences class with all-static
> methods. (Deja vu for many of us.) Plus to separately add the
> reachabilityFence/keepAlive method discussed last year as a static
> method most likely in class java.lang.ref.Reference.
>
> There seems to be only a little room for discussion on the exact sets
> of methods in the VarHandle* and Fences classes:
>
> Current versions of VarHandle do NOT include an analog of C11
> "consume" mode. As discussed last year, considering its uncertain fate
> in C/C++, a better tactic might be to support a method of the form
> "getDependently(ref)" (sorta like in the linux kernel) that covers its
> {only? main?} use case. But for now omitted.
>
> For Fences, adding plain StoreStore and LoadLoad fences to the set
> acquireFence, ReleaseFence, and fullFence seems wise. The main
> arguments against LoadLoad and StoreStore in C11 were usability
> concerns. But they have been found to be useful enough internally to
> JVMs (mainly on ARM) to be semi-supported within hotspot. And we'd
> like to not keep making the same mistake of unnecessarily making
> things ARM/POWER hostile.
>
> But the main issue at hand is how we can provide library specs for the
> APIs without revising the underlying Java Memory Model.
>
> My proposal is that we muddle through. Doing so seems surprisingly
> workable: Moded accesses and fences allow programmers to rule out some
> behaviors. Specifying them need not spell out underlying rules that
> hold when these methods are not used. And on the other side, they need
> not promise any overall property (as in: it might be the case that
> using a fullFence between every access gives you SC, but we don't need
> to guarantee it.) It is in a sense cheating to convey this in method
> specs by using terms that are not fully backed by a formal underlying
> model. But they still can be made good enough for readers to
> understand intent, and made more rigorous someday. We can and should
> also comfort programmers that, when applicable, the primary effects
> of these methods are compatible with C/C++. Given all this, the
> specs can be pretty simple. They are done out for Fences below, and
> if OK should not be hard to apply to VarHandle methods.
>
> Other ideas are of course welcome.
>
> Pasted below are preliminary versions. If they don't format
> nicely in your mail reader, get them at
> http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java
>
>
>
> /**
> * A set of methods providing fine-grained control of memory ordering.
> *
> * The Java Language Specification permits operations to be
> * executed in orders different than are apparent in program source
> * code, subject to constraints mainly stemming from the use of locks
> * and volatile fields. The methods of this class can also be used to
> * impose constraints. Their specifications are phrased in terms of
> * the lack of "reorderings" -- observable ordering effects that might
> * otherwise occur if the fence were not present.
> *
> * @apiNote More precise phrasing of these specifications may
> * accompany future updates of the Java Language Specification.
> */
> public class Fences {
>
> /**
> * Ensures that loads and stores before the fence will not be
> * reordered with loads and stores after the fence.
> *
> * @apiNote Ignoring the many semantics differences from C and
> * C++, this method has memory ordering effects compatible with
> * atomic_thread_fence(memory_order_seq_cst)
> */
> public static void fullFence() {}
>
> /**
> * Ensures that loads before the fence will not be reordered with
> * loads and stores after the fence.
> *
> * @apiNote Ignoring the many semantics differences from C and
> * C++, this method has memory ordering effects compatible with
> * atomic_thread_fence(memory_order_acquire)
> */
> public static void acquireFence() {}
>
> /**
> * Ensures that loads and stores before the fence will not be
> * reordered with stores after the fence.
> *
> * @apiNote Ignoring the many semantics differences from C and
> * C++, this method has memory ordering effects compatible with
> * atomic_thread_fence(memory_order_release)
> */
> public static void releaseFence() {}
>
> /**
> * Ensures that loads before the fence will not be reordered with
> * loads after the fence.
> */
> public static void loadLoadFence() {}
>
> /**
> * Ensures that stores before the fence will not be reordered with
> * stores after the fence.
> */
> public static void storeStoreFence() {}
>
>
> }
>
> class sample VHUsages {
> int aField;
> static Varhandle AFIELD = ...;
> void usage1() {
> int aFieldValue = AFIELD.getVolatile(this);
> }
> }
>
> /**
> * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle
> */
> abstract class NotReallyVarHandle {
> // Load
>
> T getRelaxed(Object owner);
> T getAcquire(Object owner);
> T getVolatile(Object owner);
> // tbd: Consume analog
>
> // Store
>
> void setRelaxed(Object owner, T val);
> void setRelease(Object owner, T val);
> void setVolatile(Object owner, T val);
>
> // CAS
>
> boolean compareAndSet(Object owner, T cmp, T val);
> boolean compareAndSetAcquire(Object owner, T cmp, T val);
> boolean compareAndSetRelease(Object owner, T cmp, T val);
>
> boolean weakCompareAndSet(Object owner, T cmp, T val);
> boolean weakCompareAndSetAcquire(Object owner, T cmp, T val);
> boolean weakCompareAndSetRelease(Object owner, T cmp, T val);
>
> // special RMW
>
> T getAndSet(Object owner, T val);
> T getAndAdd(Object owner, T delta);
> T addAndGet(Object owner, T delta);
> }
>
> class java.lang.ref.Reference {
> // add:
>
> /**
> * Ensures that the object referenced by the given reference
> * remains strongly reachable (as defined in the {@link
> * java.lang.ref} package documentation), regardless of any prior
> * actions of the program that might otherwise cause the object to
> * become unreachable; thus, the referenced object is not
> * reclaimable by garbage collection at least until after the
> * invocation of this method. Invocation of this method does not
> * itself initiate garbage collection or finalization.
> *
> * @param ref the reference. If null, this method has no effect.
> */
> public static void reachabilityFence(Object ref) {}
>
> }
>
From dl at cs.oswego.edu Tue Aug 11 19:11:10 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Tue, 11 Aug 2015 15:11:10 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To:
References: <55C8EBCD.7010006@cs.oswego.edu>
Message-ID: <55CA48CE.6050405@cs.oswego.edu>
On 08/11/2015 02:01 PM, Jeremy Manson wrote:
> Just curious: which end users are jonesing for a Fences API? I talk with a LOT
> of Java users, and it's *never* come up outside of the kinds of people on this list.
>
Good timing. I was about to post this set of misc follow-up notes
that mentions this.
* I don't want to get overly consumed about analogs of consume mode,
but it would be nice to deal with it in a way that is good enough not
to demand follow-up efforts. A different approach than getDependently
is to add an overloaded loadLoadFence method that restricts its scope
to loads from the given reference: void loadLoadFence(Object ref).
This is similar to the approach used in a previous (scrapped) version
of Fence APIs. JVMs could just equate it (as well as the no-arg
version) to acquireFence, but with some effort it could be implemented
to get better performance on some platforms. At the moment, this seems
like a better approach to me than VarHandle getDependently methods.
* VarHandles serve two roles: (1) providing access to CAS instructions
(plus the special RMW variants getAndAdd etc) that have no bytecodes
(2) supporting moded load/store access that in principle is the same
as people could arrange with the idiomatic application of Fences, but
in practice possibly different because JVMs might have an easier time
selecting instructions that combine fencing and access.
* The Fence API is geekier than VarHandles but no less necessary as
Unsafe goes away. One audience is a small number of expert concurrent
programmers writing performance-sensitive code. As a general
observation, the more that such people understand their concurrent
algorithms, the more flexibility they need to implement them. There
are a few constructions inside java.util.concurrent relying on them
(for example in StampedLock) that cannot otherwise be obtained in any
reasonable way. Which would be a problem without Fence API when
Unsafe goes away.
Another (mostly future) audience is Tool developers. Efforts to obtain
desired consistency properties via tool-based fence placements are
starting to approach plausibility. (Including a nice upcoming OOPSLA
paper "Declarative fence insertion" by Bender, Lesani, and Palsberg
that I don't see anywhere online yet.) We'd rather not see Java ruled
out of such efforts just because of the lack of access to
functionality that must exist in any Java implementation.
(Gratuitous aside: Fences are just flexible Semicolons. Plain
semicolons outlived their usefulness, as bemoaned in the fun read:
"The Silently Shifting Semicolon"
http://snapl.org/2015/abstracts/full/Marino.html
http://www.cs.ucla.edu/~todd/research/snapl15.pdf)
-Doug
From aph at redhat.com Wed Aug 12 08:26:33 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 12 Aug 2015 09:26:33 +0100
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CA48CE.6050405@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu>
Message-ID: <55CB0339.3020901@redhat.com>
On 11/08/15 20:11, Doug Lea wrote:
>
> * I don't want to get overly consumed about analogs of consume mode,
> but it would be nice to deal with it in a way that is good enough not
> to demand follow-up efforts. A different approach than getDependently
> is to add an overloaded loadLoadFence method that restricts its scope
> to loads from the given reference: void loadLoadFence(Object ref).
> This is similar to the approach used in a previous (scrapped) version
> of Fence APIs. JVMs could just equate it (as well as the no-arg
> version) to acquireFence, but with some effort it could be implemented
> to get better performance on some platforms. At the moment, this seems
> like a better approach to me than VarHandle getDependently methods.
I think that's overly optimistic. Given the difficulty of defining
consume(ish) mode in a way that can be understood and the time to get
HotSpot etc. to implement it correctly and efficiently is there a
pressing need to do this now?
Andrew.
From aleksey.shipilev at oracle.com Wed Aug 12 08:31:18 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 12 Aug 2015 11:31:18 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
Message-ID: <55CB0456.6060405@oracle.com>
On 08/10/2015 09:22 PM, Doug Lea wrote:
> /**
> * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle
> */
> abstract class NotReallyVarHandle {
> boolean weakCompareAndSet(Object owner, T cmp, T val);
> boolean weakCompareAndSetAcquire(Object owner, T cmp, T val);
> boolean weakCompareAndSetRelease(Object owner, T cmp, T val);
I have a question about these. I understand the acq/rel relaxations for
strong CASes, but what is the relaxation for already weak CAS? In other
words, weakCompareAndSet is spec'ed in j.u.c.atomic as:
"weakCompareAndSet atomically reads and conditionally writes a variable
but does not create any happens-before orderings, so provides no
guarantees with respect to previous or subsequent reads and writes of
any variables other than the target of the weakCompareAndSet."
...and
"Additionally weakCompareAndSet does not provide ordering guarantees
that are usually needed for synchronization control. However, the method
may be useful for updating counters and statistics when such updates are
unrelated to the other happens-before orderings of a program. When a
thread sees an update to an atomic variable caused by a
weakCompareAndSet, it does not necessarily see updates to any other
variables that occurred before the weakCompareAndSet."
Thanks,
-Aleksey
From aleksey.shipilev at oracle.com Wed Aug 12 08:33:23 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 12 Aug 2015 11:33:23 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB0456.6060405@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com>
Message-ID: <55CB04D3.9000706@oracle.com>
On 08/12/2015 11:31 AM, Aleksey Shipilev wrote:
> On 08/10/2015 09:22 PM, Doug Lea wrote:
>> /**
>> * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle
>> */
>> abstract class NotReallyVarHandle {
>> boolean weakCompareAndSet(Object owner, T cmp, T val);
>> boolean weakCompareAndSetAcquire(Object owner, T cmp, T val);
>> boolean weakCompareAndSetRelease(Object owner, T cmp, T val);
>
> I have a question about these. I understand the acq/rel relaxations for
> strong CASes, but what is the relaxation for already weak CAS? In other
> words, weakCompareAndSet is spec'ed in j.u.c.atomic as:
>
> "weakCompareAndSet atomically reads and conditionally writes a variable
> but does not create any happens-before orderings, so provides no
> guarantees with respect to previous or subsequent reads and writes of
> any variables other than the target of the weakCompareAndSet."
>
> ...and
>
> "Additionally weakCompareAndSet does not provide ordering guarantees
> that are usually needed for synchronization control. However, the method
> may be useful for updating counters and statistics when such updates are
> unrelated to the other happens-before orderings of a program. When a
> thread sees an update to an atomic variable caused by a
> weakCompareAndSet, it does not necessarily see updates to any other
> variables that occurred before the weakCompareAndSet."
Or is it the other way around: weakCAS is relaxed by default, and
acq/rel weakCAS version are actually *stronger* than default weakCAS?
-Aleksey
From aph at redhat.com Wed Aug 12 08:35:46 2015
From: aph at redhat.com (Andrew Haley)
Date: Wed, 12 Aug 2015 09:35:46 +0100
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB04D3.9000706@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com>
<55CB04D3.9000706@oracle.com>
Message-ID: <55CB0562.5010209@redhat.com>
On 12/08/15 09:33, Aleksey Shipilev wrote:
> Or is it the other way around: weakCAS is relaxed by default, and
> acq/rel weakCAS version are actually *stronger* than default weakCAS?
That's how I read it.
weakCompareAndSet is ldxr; cmp; stxr
weakCompareAndSetAcquire is ldaxr; cmp; stxr
weakCompareAndSetRelease is ldxr; cmp; stlxr
Andrew.
From aleksey.shipilev at oracle.com Wed Aug 12 08:41:54 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 12 Aug 2015 11:41:54 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB0562.5010209@redhat.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com>
<55CB04D3.9000706@oracle.com> <55CB0562.5010209@redhat.com>
Message-ID: <55CB06D2.80206@oracle.com>
On 08/12/2015 11:35 AM, Andrew Haley wrote:
> On 12/08/15 09:33, Aleksey Shipilev wrote:
>> Or is it the other way around: weakCAS is relaxed by default, and
>> acq/rel weakCAS version are actually *stronger* than default weakCAS?
>
> That's how I read it.
>
> weakCompareAndSet is ldxr; cmp; stxr
> weakCompareAndSetAcquire is ldaxr; cmp; stxr
> weakCompareAndSetRelease is ldxr; cmp; stlxr
Okay, that makes sense.
Thanks,
-Aleksey
From aleksey.shipilev at oracle.com Wed Aug 12 09:02:05 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 12 Aug 2015 12:02:05 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
Message-ID: <55CB0B8D.1020509@oracle.com>
On 08/10/2015 09:22 PM, Doug Lea wrote:
> /**
> * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle
> */
> abstract class NotReallyVarHandle {
> // CAS
>
> boolean compareAndSet(Object owner, T cmp, T val);
> boolean compareAndSetAcquire(Object owner, T cmp, T val);
> boolean compareAndSetRelease(Object owner, T cmp, T val);
Another, perhaps, "can of worms" question about CAS-es that was lurking
in my mind: which -s are we nominally handling? It was probably
discussed and answered in C/C++11 standardization efforts?
Current HotSpot handles {ref, int, long} CASes. Atomics use either of
these, with notable implementation detail for AtomicBoolean that handles
*int*, not *boolean* field.
Do we seek extending CAS to all primitive typed-fields (with
complications for value types when they come)? Do we seek C/C++11
std::atomic behavior that seems to guarantee strong CAS for every
primitive specialization? Although I'm oblivious how that is implemented
in current C/C++ libraries and compilers.
Since VarHandles are the handles over fields, every object field may
expect to be pointed at by VarHandle. Therefore, it would be too late to
modify the object representation, without penalizing the memory
footprint for all objects. In other words, we cannot make boolean field
in all objects to take 4 bytes instead of 1 byte for a singular exotic
VH.cas() usage.
Therefore, the question seems to be the hardware ability to make subword
CASes. At least x86 seems to be able to make the aligned sub-word CASes,
although I have not checked the performance for them -- but my gut
feeling is that it does not differ much, since CASes are cacheline-based
on most modern x86-s.
I mused a bit how to go forward in the absence of subword CASes. E.g.
make an aligned full-word CAS that contains the field, but that seems to
break the progress guarantees: two "strong" CASes on adjacent boolean
fields are not "strong", and one may fail spuriously. Or, e.g. locking
the object (words) for atomic operations, similarly to (late) fallback
strategy in AtomicLong for 32-bit systems, which is also not fun
implementation-wise.
Thanks,
-Aleksey
From dl at cs.oswego.edu Wed Aug 12 10:43:07 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 12 Aug 2015 06:43:07 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB0562.5010209@redhat.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com>
<55CB04D3.9000706@oracle.com> <55CB0562.5010209@redhat.com>
Message-ID: <55CB233B.40707@cs.oswego.edu>
On 08/12/2015 04:35 AM, Andrew Haley wrote:
> On 12/08/15 09:33, Aleksey Shipilev wrote:
>> Or is it the other way around: weakCAS is relaxed by default, and
>> acq/rel weakCAS version are actually *stronger* than default weakCAS?
>
> That's how I read it.
>
> weakCompareAndSet is ldxr; cmp; stxr
> weakCompareAndSetAcquire is ldaxr; cmp; stxr
> weakCompareAndSetRelease is ldxr; cmp; stlxr
>
Yes. Exactly so. I'll make a pass on the specs to clarify.
We defined weakCAS back in the original JDK5 atomics, but
just linked the Java code to plain Unsafe CAS, with the
expectation that someday someone would add intrinsics
allowing a these kinds of implementations. More than a
decade later, the intrinsics still don't exist. So it is
nice to see you and Aleksey contemplating this (also for
the acq/rel versions).
-Doug
From dl at cs.oswego.edu Wed Aug 12 12:55:42 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 12 Aug 2015 08:55:42 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB0339.3020901@redhat.com>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu> <55CB0339.3020901@redhat.com>
Message-ID: <55CB424E.6000405@cs.oswego.edu>
On 08/12/2015 04:26 AM, Andrew Haley wrote:
> On 11/08/15 20:11, Doug Lea wrote:
>>
>> * I don't want to get overly consumed about analogs of consume mode,
>> but it would be nice to deal with it in a way that is good enough not
>> to demand follow-up efforts.
>
> I think that's overly optimistic. Given the difficulty of defining
> consume(ish) mode in a way that can be understood and the time to get
> HotSpot etc. to implement it correctly and efficiently is there a
> pressing need to do this now?
>
I'm not sure. Do we add loadLoadFence(Object ref) and get it over
with, or postpone? In light of the tendency for unresolved issues
to languish for many years, I'd like to at least carefully consider
the options.
Background: The idea is basically the same as already exists
for final fields: Even on processors otherwise requiring
load fences, they are not in general used when reading
a final field given a reference to an object containing one.
The underlying notion is that the processor/thread cannot
possibly speculate/reuse the field value given that it has
never seen that reference before, as assured by allocation and
GC protocols, along with store/release fences in constructors.
(Aside: this reasoning is not airtight until OOTA-related
issues are solved.) And further, once read, the value can be
used forever without re-reading. Well, except that actual JVMs
are more conservative about this, trying (imperfectly) to
cope with some cases of modifying final fields.
In RCU (the main use case driving consume mode), pretty much
the same holds except that the compiler is not aware of the
constraints, so doesn't know that it can elide the load fence,
and also doesn't know that it must discard any previous
values. So you have to tell it using a special load or fence
operation. Most of the time, just using relaxed-loads would
work fine. But "most of the time" is not good enough.
I'm sure that there are plenty of cases out there where
Java programs also do this and get away with it because
the constraints happen to hold (even though not officially DRF).
And also buggy cases where they only sometimes hold, but would
be "fixed" via the use of a dependent-load fence/method.
Using an unscoped loadLoadFence or acquireFence would also fix,
but at greater cost on some processors. And 99+% of programmers
would accept the (possibly) further cost of using volatile here
to make their programs race-free by the usual criteria.
Given this background, I'm still not sure of the best
near-term choice:
Only a few extremely performance sensitive users on only some
platforms would be impacted if loadLoadFence(Object ref) were
defined but by default bound to acquireFence (Unsafe loadFence).
And conversely, only a few users would be impacted if
loadLoadFence(Object ref) did not even exist until
some time after RCU-like constructions became more
widespread.
I'm still thinking that the get-it-over-with option is
slightly better.
-Doug
From dl at cs.oswego.edu Wed Aug 12 14:38:10 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 12 Aug 2015 10:38:10 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB0B8D.1020509@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com>
Message-ID: <55CB5A52.7030100@cs.oswego.edu>
On 08/12/2015 05:02 AM, Aleksey Shipilev wrote:
> On 08/10/2015 09:22 PM, Doug Lea wrote:
>> boolean compareAndSet(Object owner, T cmp, T val);
>> boolean compareAndSetAcquire(Object owner, T cmp, T val);
>> boolean compareAndSetRelease(Object owner, T cmp, T val);
>
> Another, perhaps, "can of worms" question about CAS-es that was lurking
> in my mind: which -s are we nominally handling?
The answers need to be a little different for Java and C++.
In C++, you can only apply to "atomic" types.
Under the current VarHandle scheme, you can attach to
anything at compile time, but methods may bail out
throwing an UnsupportedOperationException at run-time.
This will always occur for example when applying getAndAdd
to a VarHandle of any Object type.
This was apparently the tradeoff most acceptable to all concerned.
It means that in addition to memory-model-related specs,
the methods will need to carry warnings that they might
just throw UOE; hopefully enumerating common cases
like CAS for sub-word types and unaligned memory.
>
> Therefore, the question seems to be the hardware ability to make subword
> CASes.
My understanding (and preference) is that trying to CAS with
VarHandles to bytes and other small types just throws
UOE and should be spec'ed to do so.
-Doug
From aleksey.shipilev at oracle.com Wed Aug 12 16:20:44 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Wed, 12 Aug 2015 19:20:44 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB233B.40707@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CB0456.6060405@oracle.com> <55CB04D3.9000706@oracle.com>
<55CB0562.5010209@redhat.com> <55CB233B.40707@cs.oswego.edu>
Message-ID: <55CB725C.3010003@oracle.com>
On 08/12/2015 01:43 PM, Doug Lea wrote:
> On 08/12/2015 04:35 AM, Andrew Haley wrote:
>> On 12/08/15 09:33, Aleksey Shipilev wrote:
>>> Or is it the other way around: weakCAS is relaxed by default, and
>>> acq/rel weakCAS version are actually *stronger* than default weakCAS?
>>
>> That's how I read it.
>>
>> weakCompareAndSet is ldxr; cmp; stxr
>> weakCompareAndSetAcquire is ldaxr; cmp; stxr
>> weakCompareAndSetRelease is ldxr; cmp; stlxr
>>
>
> Yes. Exactly so. I'll make a pass on the specs to clarify.
>
> We defined weakCAS back in the original JDK5 atomics, but
> just linked the Java code to plain Unsafe CAS, with the
> expectation that someday someone would add intrinsics
> allowing a these kinds of implementations. More than a
> decade later, the intrinsics still don't exist. So it is
> nice to see you and Aleksey contemplating this (also for
> the acq/rel versions).
Yes, we are drilling more holes in Unsafe/VarHandles to support these
usages:
https://bugs.openjdk.java.net/browse/JDK-8133451
Andrew, I sent a separate note with a patch to review.
Thanks,
-Aleksey
From boehm at acm.org Wed Aug 12 22:33:13 2015
From: boehm at acm.org (Hans Boehm)
Date: Wed, 12 Aug 2015 15:33:13 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
Message-ID:
On Mon, Aug 10, 2015 at 11:22 AM, Doug Lea wrote:
>
> ...
>
> For Fences, adding plain StoreStore and LoadLoad fences to the set
> acquireFence, ReleaseFence, and fullFence seems wise. The main
> arguments against LoadLoad and StoreStore in C11 were usability
> concerns. But they have been found to be useful enough internally to
> JVMs (mainly on ARM) to be semi-supported within hotspot. And we'd
> like to not keep making the same mistake of unnecessarily making
> things ARM/POWER hostile.
Let me argue once more against LoadLoad, and at least dampen the
enthusiasm for StoreStore.
I know of no hardware instructions, except on SPARC, that correspond
to a LoadLoad fence. And my impression is that it's not very useful on
SPARC. The ARM DMB xLD fence instruction, if I understand correctly,
is essentially a C++ acquire fence. POWER lwsync is even stronger
than that. For TSO architectures, loadLoadFence and acquireFence are both
no-ops.
I believe that in all mainstream environments, loadloadFence will
generate the same code as acquireFence, but is much harder to specify
and much trickier to use correctly. Why would you bother?
The StoreStore situation is slightly different for two reasons:
1) It does seem to correspond to the ARM DMB xST instruction.
2) It does seem to be useful in VM implementations, in that it seems to
suffice as the end-of-constructor fence.
However, I think it difficult to specify correctly outside of that specific
essentially final-field-initialization scenario. If I write
x++; // Increment zero initialized field
storeStoreFence();
x_init = true;
can the load for the x++ see a value that was assigned to x in
another thread after seeing x_init set to true? The load can be reordered
with the fence. The store can't, but the store "depends on" the load.
But we don't enforce dependency-based ordering, mostly because we
don't know how to define it.
Even if we could define it reasonably, the semantics strike me as thoroughly
weird, even by my warped standards:
Consider:
Thread 1:
x = ... ? 1 : 2;
assert (x < 3);
storeStoreFence();
x_init = true;
Thread 2:
if (x_init) {
acquireFence();
x = 17;
}
allows the assertion to fail.
In a more complex example, any method calls used to initialize x in thread 1
need to be careful not to read the value they just wrote. You guys did
realize that you just volunteered to make a pass through the library
specification
to specify which calls are safe in such cases, right?
It may be more promising to somehow specify it only for use with write-once
variables. But I don't really know how to make that work either.
And then there are important transitivity/cumulativity questions ...
I think LoadLoad actually has some analogous issues as well. But unlike
StoreStore,
there seems to be no hardware/performance argument to even consider it.
Hans
>
>
> /**
> * A set of methods providing fine-grained control of memory ordering.
> *
> * The Java Language Specification permits operations to be
> * executed in orders different than are apparent in program source
> * code, subject to constraints mainly stemming from the use of locks
> * and volatile fields. The methods of this class can also be used to
> * impose constraints. Their specifications are phrased in terms of
> * the lack of "reorderings" -- observable ordering effects that might
> * otherwise occur if the fence were not present.
> *
> * @apiNote More precise phrasing of these specifications may
> * accompany future updates of the Java Language Specification.
> */
> public class Fences {
>
> /**
> * Ensures that loads and stores before the fence will not be
> * reordered with loads and stores after the fence.
> *
> * @apiNote Ignoring the many semantics differences from C and
> * C++, this method has memory ordering effects compatible with
> * atomic_thread_fence(memory_order_seq_cst)
> */
> public static void fullFence() {}
>
> /**
> * Ensures that loads before the fence will not be reordered with
> * loads and stores after the fence.
> *
> * @apiNote Ignoring the many semantics differences from C and
> * C++, this method has memory ordering effects compatible with
> * atomic_thread_fence(memory_order_acquire)
> */
> public static void acquireFence() {}
>
> /**
> * Ensures that loads and stores before the fence will not be
> * reordered with stores after the fence.
> *
> * @apiNote Ignoring the many semantics differences from C and
> * C++, this method has memory ordering effects compatible with
> * atomic_thread_fence(memory_order_release)
> */
> public static void releaseFence() {}
>
> /**
> * Ensures that loads before the fence will not be reordered with
> * loads after the fence.
> */
> public static void loadLoadFence() {}
>
> /**
> * Ensures that stores before the fence will not be reordered with
> * stores after the fence.
> */
> public static void storeStoreFence() {}
>
>
> }
>
From dl at cs.oswego.edu Thu Aug 13 12:19:17 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 13 Aug 2015 08:19:17 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References:
Message-ID: <55CC8B45.50909@cs.oswego.edu>
On 08/12/2015 06:33 PM, Hans Boehm wrote:
>
> Let me argue once more against LoadLoad, and at least dampen the
> enthusiasm for StoreStore.
Thanks for the critiques! (Even though I remain unconvinced.)
I should have noted that ARM mappings are only part of the motivation
for loadLoadFence and storeStoreFence. Another is protection against
loop "optimizations" that are highly non-optimal. This is not
strictly a compiler issue, but easier to illustrate as one. Suppose
for example you have a method that writes several variables, along
with reader methods that can handle all ordering races among the
writes. But you still want to ensure that the variables are actually
written if the method is called in a loop. A trailing
storeStoreFence() seems to be the cheapest and conceptually most
appropriate way to reduce communication latency. (In other words, it
is "correct" but undesirable for method c() here to only use the
final (x, y) values.) Symmetrical arguments apply to using
leading loadLoadFences on the complementary reader methods
(that is otherwise similar to RCU-like constructions).
class C {
int x = 0, y = 0; // relaxed
void p() { // called in producer thread
for (int i = 0; i < 1000000; ++i)
writes(heavyPureComputation(i));
}
void c() { // called in consumer thread
for (;;) {
if (occasionally)
reads();
// ...
}
void writes(int k) {
x = k;
y = k + 17;
storeStoreFence(); // please actually store x and y if in a loop
}
void reads() {
loadLoadFence(); // please actually load x and y if in a loop
if (y == x + 17)
something();
}
}
This is not a hypothetical example. It's abstracted from cases I've
encountered. Like the RCU-like examples mentioned yesterday, these effects
arise only when you are writing racy performance-critical code. But
that's what low-level concurrent algorithm and data structure
designers do!
Back to ..
>
> I know of no hardware instructions, except on SPARC, that correspond
> to a LoadLoad fence. And my impression is that it's not very useful on
> SPARC. The ARM DMB xLD fence instruction, if I understand correctly,
> is essentially a C++ acquire fence.
But I think that pseudo-fences (load; compare to self; ...) need not be?
>
> However, I think it difficult to specify correctly outside of that specific
> essentially final-field-initialization scenario.
It doesn't seem hard at all to specify in isolation.
The interactions with base ordering rules can be non-obvious though.
(Especially since, in the absence of a revised base model,
those rules might as well say that anything goes.)
So, like any fence method, it should be used when nothing
simpler applies. And surely not in:
>
> x++; // Increment zero initialized field
> storeStoreFence();
> x_init = true;
-Doug
From boehm at acm.org Thu Aug 13 21:04:48 2015
From: boehm at acm.org (Hans Boehm)
Date: Thu, 13 Aug 2015 14:04:48 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To: <55CC8B45.50909@cs.oswego.edu>
References:
<55CC8B45.50909@cs.oswego.edu>
Message-ID:
On Thu, Aug 13, 2015 at 5:19 AM, Doug Lea wrote:
>
> On 08/12/2015 06:33 PM, Hans Boehm wrote:
>>
>>
>> Let me argue once more against LoadLoad, and at least dampen the
>> enthusiasm for StoreStore.
>
>
> Thanks for the critiques! (Even though I remain unconvinced.)
>
> I should have noted that ARM mappings are only part of the motivation
> for loadLoadFence and storeStoreFence. Another is protection against
> loop "optimizations" that are highly non-optimal. This is not
> strictly a compiler issue, but easier to illustrate as one. Suppose
> for example you have a method that writes several variables, along
> with reader methods that can handle all ordering races among the
> writes. But you still want to ensure that the variables are actually
> written if the method is called in a loop. A trailing
> storeStoreFence() seems to be the cheapest and conceptually most
> appropriate way to reduce communication latency. (In other words, it
> is "correct" but undesirable for method c() here to only use the
> final (x, y) values.) Symmetrical arguments apply to using
> leading loadLoadFences on the complementary reader methods
> (that is otherwise similar to RCU-like constructions).
>
> class C {
> int x = 0, y = 0; // relaxed
>
> void p() { // called in producer thread
> for (int i = 0; i < 1000000; ++i)
> writes(heavyPureComputation(i));
> }
>
> void c() { // called in consumer thread
> for (;;) {
> if (occasionally)
> reads();
> // ...
> }
>
> void writes(int k) {
> x = k;
> y = k + 17;
> storeStoreFence(); // please actually store x and y if in a loop
> }
>
> void reads() {
> loadLoadFence(); // please actually load x and y if in a loop
> if (y == x + 17)
> something();
> }
> }
>
> This is not a hypothetical example. It's abstracted from cases I've
> encountered. Like the RCU-like examples mentioned yesterday, these effects
> arise only when you are writing racy performance-critical code. But
> that's what low-level concurrent algorithm and data structure
> designers do!
No disagreement about the existence of this problem. There was a recent
long
discussion of this on a C++ mailing list. There is not yet agreement there
about the correct solution. But we didn't have any advocates for this
approach.
I think this is fundamentally a completely different problem that has
nothing
to do with restricting order to either only loads or only stores. You are
trying to instead dissuade the compiler from drastic code movement in
certain
cases.
I don't think a fence-based approach works. Deferring all the stores to
the end of
the loop fundamentally remains correct, even with the StoreStore fence,
since it's consistent with the producer just running very fast for a while.
The constraint you're trying to enforce has nothing
to do with ordering.
Aside from not working correctly, you end up slowing down ARM code in ways
that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst"
fences
everywhere. (How expensive they are varies. On a number of implementations
they basically seem to be full fences.)
My personal favorite solution to this problem is to add an annotation for
fields that are used as relaxed atomics, and to agree that high quality
compilers
should basically leave those alone. Optimizing those using conventional
rules
for sequential performance may lead to disastrous performance for the whole
multithreaded system. If you don't understand it, leave it alone.
Peter Dimov pointed out that there are cases, e.g. consecutive C++
reference count updates, where you probably do want the compiler to
aggressively optimize in spite of concurrent access. You may need a second
annotation for those.
>
> Back to ..
>
>>
>> I know of no hardware instructions, except on SPARC, that correspond
>> to a LoadLoad fence. And my impression is that it's not very useful on
>> SPARC. The ARM DMB xLD fence instruction, if I understand correctly,
>> is essentially a C++ acquire fence.
>
>
> But I think that pseudo-fences (load; compare to self; ...) need not be?
Those are fundamentally LoadStore fences. On Power you can also turn
them into a LoadLoad fence by adding an isync. I think the ARM situation
is essentially identical.
>
>>
>> However, I think it difficult to specify correctly outside of that
specific
>> essentially final-field-initialization scenario.
>
>
> It doesn't seem hard at all to specify in isolation.
> The interactions with base ordering rules can be non-obvious though.
> (Especially since, in the absence of a revised base model,
> those rules might as well say that anything goes.)
> So, like any fence method, it should be used when nothing
> simpler applies. And surely not in:
>
>
>>
>> x++; // Increment zero initialized field
>> storeStoreFence();
>> x_init = true;
My problem is that this looks a lot like a constructor fence, or maybe the
writer side of a seqlock, which are the only use cases I know of for
StoreStore
fences.
And the harder I think about constructor fences, the more nervous I get
about using StoreStore fences there without fully understanding the
transitivity issues.
Hans
>
>
> -Doug
>
From dl at cs.oswego.edu Thu Aug 13 23:19:37 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 13 Aug 2015 19:19:37 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References: <55CC8B45.50909@cs.oswego.edu>
Message-ID: <55CD2609.3070601@cs.oswego.edu>
On 08/13/2015 05:04 PM, Hans Boehm wrote:
> I don't think a fence-based approach works. Deferring all the stores to the
> end of the loop fundamentally remains correct, even with the StoreStore
> fence, since it's consistent with the producer just running very fast for a
> while. The constraint you're trying to enforce has nothing to do with
> ordering.
I must be missing something fundamental about C++ specs. Are C++
compilers allowed to ignore release fences in between writes
to the same variables? In unrolled form, that's what this would
amount to here.
>
> Aside from not working correctly, you end up slowing down ARM code in ways
> that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst"
> fences everywhere. (How expensive they are varies. On a number of
> implementations they basically seem to be full fences.)
Right. It does put the programmer in control though; for example
if ((i % 100) == 99) storeStoreFence()
Considering that the goal is communication latency reduction at
the expense of throughput, only the programmer would be able
to make these tradeoffs.
-Doug
From boehm at acm.org Thu Aug 13 23:56:16 2015
From: boehm at acm.org (Hans Boehm)
Date: Thu, 13 Aug 2015 16:56:16 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To: <55CD2609.3070601@cs.oswego.edu>
References:
<55CC8B45.50909@cs.oswego.edu>
<55CD2609.3070601@cs.oswego.edu>
Message-ID:
On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea wrote:
>
> On 08/13/2015 05:04 PM, Hans Boehm wrote:
>
>> I don't think a fence-based approach works. Deferring all the stores to
the
>> end of the loop fundamentally remains correct, even with the StoreStore
>> fence, since it's consistent with the producer just running very fast
for a
>> while. The constraint you're trying to enforce has nothing to do with
>> ordering.
>
>
> I must be missing something fundamental about C++ specs. Are C++
> compilers allowed to ignore release fences in between writes
> to the same variables? In unrolled form, that's what this would
> amount to here.
I think that's unavoidable. If I write
for (...) {
x = something_expensive();
fence;
}
it's very hard to prevent the implementation from implementing that as
And that looks exactly like merging all the stores into one.
I think you can't really disallow the latter without disallowing a
time-slicing
uniprocessor scheduler. They're behaviorally identical.
That doesn't prevent us from providing the compiler with advice to
discourage
that. But I think this doesn't have anything to do with fences.
>
>>
>> Aside from not working correctly, you end up slowing down ARM code in
ways
>> that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst"
>> fences everywhere. (How expensive they are varies. On a number of
>> implementations they basically seem to be full fences.)
>
>
> Right. It does put the programmer in control though; for example
> if ((i % 100) == 99) storeStoreFence()
Agreed. But it seems to me that this is control over an unnecessary
trade-off. Clearly the ideal code involves no fences. And in most cases,
just having the programmer specifying where the stores to shared variables
should go, and having the compiler leave that alone, seems like a better
and simpler way to control this.
Hans
>
> Considering that the goal is communication latency reduction at
> the expense of throughput, only the programmer would be able
> to make these tradeoffs.
>
> -Doug
>
>
From dl at cs.oswego.edu Fri Aug 14 11:50:53 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 14 Aug 2015 07:50:53 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu>
Message-ID: <55CDD61D.80602@cs.oswego.edu>
On 08/13/2015 07:56 PM, Hans Boehm wrote:
> On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea >
> > I must be missing something fundamental about C++ specs. Are C++
> > compilers allowed to ignore release fences in between writes
> > to the same variables? In unrolled form, that's what this would
> > amount to here.
>
> I think that's unavoidable. If I write
>
> for (...) {
> x = something_expensive();
> fence;
> }
>
> it's very hard to prevent the implementation from implementing that as
>
>
>
>
> And that looks exactly like merging all the stores into one.
In other words, it is legal (at least for some loops in
which you can prove termination etc) to postpone ALL the stores
to end of loop (assuming the CPU has enough registers to
hold them all or is willing to use unbounded local stack space),
in which case even if they are performed in order, you may lose
responsiveness.
So there are limitations in the ability of ordering control to
improve responsiveness. Which is unsurprising given all the
other limitations under weak scheduling guarantees. But
that's not much of an argument for not even allowing it.
> But I think this doesn't have anything to do with fences.
Ordering constraints seem intrinsic to the problem at hand.
It's the complement of the main issue in RCU/consume:
"really read this" vs "really write this".
And like RCU, the construction can be seen as one in which
a special-case optimization removing a processor-level fence
may apply in conditions that are hard to express/control.
You can in the mean time cheat in both cases in C/C++ by
casting to (C) volatile. But nothing like this applies in Java.
-Doug
From paulmck at linux.vnet.ibm.com Fri Aug 14 12:39:04 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 05:39:04 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References:
<55CC8B45.50909@cs.oswego.edu>
<55CD2609.3070601@cs.oswego.edu>
Message-ID: <20150814123903.GG3895@linux.vnet.ibm.com>
On Thu, Aug 13, 2015 at 04:56:16PM -0700, Hans Boehm wrote:
> On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea wrote:
> >
> > On 08/13/2015 05:04 PM, Hans Boehm wrote:
> >
> >> I don't think a fence-based approach works. Deferring all the stores to
> the
> >> end of the loop fundamentally remains correct, even with the StoreStore
> >> fence, since it's consistent with the producer just running very fast
> for a
> >> while. The constraint you're trying to enforce has nothing to do with
> >> ordering.
> >
> >
> > I must be missing something fundamental about C++ specs. Are C++
> > compilers allowed to ignore release fences in between writes
> > to the same variables? In unrolled form, that's what this would
> > amount to here.
>
> I think that's unavoidable. If I write
>
> for (...) {
> x = something_expensive();
> fence;
> }
>
> it's very hard to prevent the implementation from implementing that as
>
>
>
As in place an unrolled version of the loop into a hardware sort-of
transaction, like the ones available on recent x86, Power, and mainframe
systems? If that is what you are getting at, this scenario sounds like
an excellent argument for having some way to tell the compiler not to
make that transformation. Perhaps the fence above should be interpreted
as such saying to bound any transactions.
> And that looks exactly like merging all the stores into one.
It would, but don't expect any real-time programmers to thank you
for doing that to their code. ;-)
> I think you can't really disallow the latter without disallowing a
> time-slicing
> uniprocessor scheduler. They're behaviorally identical.
If you are excluding performance and response-time characteristics
from "behaviorally", I can't argue. Again, don't expect any thanks
from real-time programmers.
> That doesn't prevent us from providing the compiler with advice to
> discourage
> that. But I think this doesn't have anything to do with fences.
Volatile fences, then? ;-)
> >> Aside from not working correctly, you end up slowing down ARM code in
> ways
> >> that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst"
> >> fences everywhere. (How expensive they are varies. On a number of
> >> implementations they basically seem to be full fences.)
> >
> >
> > Right. It does put the programmer in control though; for example
> > if ((i % 100) == 99) storeStoreFence()
>
> Agreed. But it seems to me that this is control over an unnecessary
> trade-off. Clearly the ideal code involves no fences. And in most cases,
> just having the programmer specifying where the stores to shared variables
> should go, and having the compiler leave that alone, seems like a better
> and simpler way to control this.
In all cases?
Thanx, Paul
> Hans
>
> >
> > Considering that the goal is communication latency reduction at
> > the expense of throughput, only the programmer would be able
> > to make these tradeoffs.
> >
> > -Doug
> >
> >
>
From dl at cs.oswego.edu Fri Aug 14 13:35:40 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 14 Aug 2015 09:35:40 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References: <55CC8B45.50909@cs.oswego.edu>
Message-ID: <55CDEEAC.2000300@cs.oswego.edu>
On 08/13/2015 05:04 PM, Hans Boehm wrote:
> >> I know of no hardware instructions, except on SPARC, that correspond
> >> to a LoadLoad fence. And my impression is that it's not very useful on
> >> SPARC. The ARM DMB xLD fence instruction, if I understand correctly,
> >> is essentially a C++ acquire fence.
> >
> >
> > But I think that pseudo-fences (load; compare to self; ...) need not be?
> Those are fundamentally LoadStore fences. On Power you can also turn
> them into a LoadLoad fence by adding an isync. I think the ARM situation
> is essentially identical.
On a little more checking, I think so too.
(Formal validation would be nice...)
But even if so, it seems better to have a uniform API:
writes: full > release > storeStore
reads: full > acquire > loadLoad
even if loadLoad is internally mapped to acquire.
And omitting it feels even more wrong if we support
RCU-like usages with scoped loadLoadFence(Object ref).
-Doug
From paulmck at linux.vnet.ibm.com Fri Aug 14 15:39:03 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 08:39:03 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CB5A52.7030100@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com>
<55CB5A52.7030100@cs.oswego.edu>
Message-ID: <20150814153903.GJ3895@linux.vnet.ibm.com>
On Wed, Aug 12, 2015 at 10:38:10AM -0400, Doug Lea wrote:
> On 08/12/2015 05:02 AM, Aleksey Shipilev wrote:
> >On 08/10/2015 09:22 PM, Doug Lea wrote:
>
> >> boolean compareAndSet(Object owner, T cmp, T val);
> >> boolean compareAndSetAcquire(Object owner, T cmp, T val);
> >> boolean compareAndSetRelease(Object owner, T cmp, T val);
> >
> >Another, perhaps, "can of worms" question about CAS-es that was lurking
> >in my mind: which -s are we nominally handling?
>
> The answers need to be a little different for Java and C++.
> In C++, you can only apply to "atomic" types.
>
> Under the current VarHandle scheme, you can attach to
> anything at compile time, but methods may bail out
> throwing an UnsupportedOperationException at run-time.
>
> This will always occur for example when applying getAndAdd
> to a VarHandle of any Object type.
>
> This was apparently the tradeoff most acceptable to all concerned.
> It means that in addition to memory-model-related specs,
> the methods will need to carry warnings that they might
> just throw UOE; hopefully enumerating common cases
> like CAS for sub-word types and unaligned memory.
>
> >Therefore, the question seems to be the hardware ability to make subword
> >CASes.
>
> My understanding (and preference) is that trying to CAS with
> VarHandles to bytes and other small types just throws
> UOE and should be spec'ed to do so.
OK, I will bite... Why can't you just apply a CAS instruction to the
larger word containing the subword?
Thanx, Paul
From aleksey.shipilev at oracle.com Fri Aug 14 15:44:27 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 14 Aug 2015 18:44:27 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <20150814153903.GJ3895@linux.vnet.ibm.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com>
<55CB5A52.7030100@cs.oswego.edu>
<20150814153903.GJ3895@linux.vnet.ibm.com>
Message-ID: <55CE0CDB.1070005@oracle.com>
On 14.08.2015 18:39, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 10:38:10AM -0400, Doug Lea wrote:
>> On 08/12/2015 05:02 AM, Aleksey Shipilev wrote:
>>> On 08/10/2015 09:22 PM, Doug Lea wrote:
>>
>>>> boolean compareAndSet(Object owner, T cmp, T val);
>>>> boolean compareAndSetAcquire(Object owner, T cmp, T val);
>>>> boolean compareAndSetRelease(Object owner, T cmp, T val);
>>>
>>> Another, perhaps, "can of worms" question about CAS-es that was lurking
>>> in my mind: which -s are we nominally handling?
>>
>> The answers need to be a little different for Java and C++.
>> In C++, you can only apply to "atomic" types.
>>
>> Under the current VarHandle scheme, you can attach to
>> anything at compile time, but methods may bail out
>> throwing an UnsupportedOperationException at run-time.
>>
>> This will always occur for example when applying getAndAdd
>> to a VarHandle of any Object type.
>>
>> This was apparently the tradeoff most acceptable to all concerned.
>> It means that in addition to memory-model-related specs,
>> the methods will need to carry warnings that they might
>> just throw UOE; hopefully enumerating common cases
>> like CAS for sub-word types and unaligned memory.
>>
>>> Therefore, the question seems to be the hardware ability to make subword
>>> CASes.
>>
>> My understanding (and preference) is that trying to CAS with
>> VarHandles to bytes and other small types just throws
>> UOE and should be spec'ed to do so.
>
> OK, I will bite... Why can't you just apply a CAS instruction to the
> larger word containing the subword?
As I tried to explain before, this seems to break the guarantees for
strong CAS, that at least one of the CASes against the same location
should succeed. The trick here is that on Java level, "location" is
defined as "field" or something like that.
Case in point: imagine you have two subword-sized fields in the same
word, and you do CAS on each of them. From Java level, these CASes are
uncontended, and both should succeed. In our hypothetical
implementation, one of the CASes would fail. This is a good thing for
weakCAS that is spec-ed to allow spurious failures, but not for a strong
one.
Thanks,
-Aleksey
From paulmck at linux.vnet.ibm.com Fri Aug 14 15:36:02 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 08:36:02 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CA48CE.6050405@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu>
Message-ID: <20150814153602.GI3895@linux.vnet.ibm.com>
On Tue, Aug 11, 2015 at 03:11:10PM -0400, Doug Lea wrote:
> On 08/11/2015 02:01 PM, Jeremy Manson wrote:
> >Just curious: which end users are jonesing for a Fences API? I talk with a LOT
> >of Java users, and it's *never* come up outside of the kinds of people on this list.
> >
>
> Good timing. I was about to post this set of misc follow-up notes
> that mentions this.
>
> * I don't want to get overly consumed about analogs of consume mode,
> but it would be nice to deal with it in a way that is good enough not
> to demand follow-up efforts. A different approach than getDependently
> is to add an overloaded loadLoadFence method that restricts its scope
> to loads from the given reference: void loadLoadFence(Object ref).
> This is similar to the approach used in a previous (scrapped) version
> of Fence APIs. JVMs could just equate it (as well as the no-arg
> version) to acquireFence, but with some effort it could be implemented
> to get better performance on some platforms. At the moment, this seems
> like a better approach to me than VarHandle getDependently methods.
Well, if you change your mind about being consumed by analogs of consume,
please see the attached revision of C++ working draft N4321. ;-)
The most recent addition is what can be thought of as a defacto
characterization of current compiler behavior in conjunction with actual
Linux-kernel usage. This addition is in Section 7.9 starting on page 28.
Prior to Section 7.9, there are a number of other proposals involving
adding additional marking to the dependency chains. Section 7.10
starting on page 34 gives a draft evaluation of the various proposals,
see especially Table 1 on page 35.
Within the Linux kernel, dependency-chain heads are marked with one of
the variants of rcu_dereference(), which has acquire-like semantics,
so that rcu_dereference()'s load of the pointer is ordered against both
loads and stores that are part of the dependency chain headed by that
rcu_dereference(). For a fanciful example:
p = rcu_dereference(gp);
if (!p->visited) /* Don't unnecessarily dirty the cache. */
p->visited = 1;
Here the assignment to p->visited is ordered after the load from gp.
> * VarHandles serve two roles: (1) providing access to CAS instructions
> (plus the special RMW variants getAndAdd etc) that have no bytecodes
> (2) supporting moded load/store access that in principle is the same
> as people could arrange with the idiomatic application of Fences, but
> in practice possibly different because JVMs might have an easier time
> selecting instructions that combine fencing and access.
>
> * The Fence API is geekier than VarHandles but no less necessary as
> Unsafe goes away. One audience is a small number of expert concurrent
> programmers writing performance-sensitive code. As a general
> observation, the more that such people understand their concurrent
> algorithms, the more flexibility they need to implement them. There
> are a few constructions inside java.util.concurrent relying on them
> (for example in StampedLock) that cannot otherwise be obtained in any
> reasonable way. Which would be a problem without Fence API when
> Unsafe goes away.
>
> Another (mostly future) audience is Tool developers. Efforts to obtain
> desired consistency properties via tool-based fence placements are
> starting to approach plausibility. (Including a nice upcoming OOPSLA
> paper "Declarative fence insertion" by Bender, Lesani, and Palsberg
> that I don't see anywhere online yet.) We'd rather not see Java ruled
> out of such efforts just because of the lack of access to
> functionality that must exist in any Java implementation.
>
> (Gratuitous aside: Fences are just flexible Semicolons. Plain
> semicolons outlived their usefulness, as bemoaned in the fun read:
> "The Silently Shifting Semicolon"
> http://snapl.org/2015/abstracts/full/Marino.html
> http://www.cs.ucla.edu/~todd/research/snapl15.pdf)
I do like this view of the matter! Which probably means that everyone
else hates it. ;-)
Thanx, Paul
From paulmck at linux.vnet.ibm.com Fri Aug 14 16:03:50 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 09:03:50 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CE0CDB.1070005@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com>
<55CB5A52.7030100@cs.oswego.edu>
<20150814153903.GJ3895@linux.vnet.ibm.com>
<55CE0CDB.1070005@oracle.com>
Message-ID: <20150814160350.GM3895@linux.vnet.ibm.com>
On Fri, Aug 14, 2015 at 06:44:27PM +0300, Aleksey Shipilev wrote:
> On 14.08.2015 18:39, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 10:38:10AM -0400, Doug Lea wrote:
> >> On 08/12/2015 05:02 AM, Aleksey Shipilev wrote:
> >>> On 08/10/2015 09:22 PM, Doug Lea wrote:
> >>
> >>>> boolean compareAndSet(Object owner, T cmp, T val);
> >>>> boolean compareAndSetAcquire(Object owner, T cmp, T val);
> >>>> boolean compareAndSetRelease(Object owner, T cmp, T val);
> >>>
> >>> Another, perhaps, "can of worms" question about CAS-es that was lurking
> >>> in my mind: which -s are we nominally handling?
> >>
> >> The answers need to be a little different for Java and C++.
> >> In C++, you can only apply to "atomic" types.
> >>
> >> Under the current VarHandle scheme, you can attach to
> >> anything at compile time, but methods may bail out
> >> throwing an UnsupportedOperationException at run-time.
> >>
> >> This will always occur for example when applying getAndAdd
> >> to a VarHandle of any Object type.
> >>
> >> This was apparently the tradeoff most acceptable to all concerned.
> >> It means that in addition to memory-model-related specs,
> >> the methods will need to carry warnings that they might
> >> just throw UOE; hopefully enumerating common cases
> >> like CAS for sub-word types and unaligned memory.
> >>
> >>> Therefore, the question seems to be the hardware ability to make subword
> >>> CASes.
> >>
> >> My understanding (and preference) is that trying to CAS with
> >> VarHandles to bytes and other small types just throws
> >> UOE and should be spec'ed to do so.
> >
> > OK, I will bite... Why can't you just apply a CAS instruction to the
> > larger word containing the subword?
>
> As I tried to explain before, this seems to break the guarantees for
> strong CAS, that at least one of the CASes against the same location
> should succeed. The trick here is that on Java level, "location" is
> defined as "field" or something like that.
>
> Case in point: imagine you have two subword-sized fields in the same
> word, and you do CAS on each of them. From Java level, these CASes are
> uncontended, and both should succeed. In our hypothetical
> implementation, one of the CASes would fail. This is a good thing for
> weakCAS that is spec-ed to allow spurious failures, but not for a strong
> one.
Thank you for the tutorial, makes perfect sense.
Thanx, Paul
From dl at cs.oswego.edu Fri Aug 14 17:57:41 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 14 Aug 2015 13:57:41 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <20150814153602.GI3895@linux.vnet.ibm.com>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu>
<20150814153602.GI3895@linux.vnet.ibm.com>
Message-ID: <55CE2C15.4060706@cs.oswego.edu>
On 08/14/2015 11:36 AM, Paul E. McKenney wrote:
> Well, if you change your mind about being consumed by analogs of consume,
> please see the attached revision of C++ working draft N4321. ;-)
Thanks.
My proposal to just introduce loadLoadFence(ref) (which further
simplifies VarHandle.getDependently(ref)) was based in part
on Section 3 (of the last version of N4321 I'd seen), that
mentions and dismisses the idea of forcing layer-by-layer
use of something similar in extended dependency chains.
Without an OS kernel full of prior users/code to deal
with, this seems to minimally suffice. Especially since in
Java, programmers are more willing to use tools that might help
automate tiered fence placement. Yes?
-Doug
From paulmck at linux.vnet.ibm.com Fri Aug 14 18:17:20 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 11:17:20 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CE2C15.4060706@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu>
<20150814153602.GI3895@linux.vnet.ibm.com>
<55CE2C15.4060706@cs.oswego.edu>
Message-ID: <20150814181720.GP3895@linux.vnet.ibm.com>
On Fri, Aug 14, 2015 at 01:57:41PM -0400, Doug Lea wrote:
> On 08/14/2015 11:36 AM, Paul E. McKenney wrote:
>
> >Well, if you change your mind about being consumed by analogs of consume,
> >please see the attached revision of C++ working draft N4321. ;-)
>
> Thanks.
>
> My proposal to just introduce loadLoadFence(ref) (which further
> simplifies VarHandle.getDependently(ref)) was based in part
> on Section 3 (of the last version of N4321 I'd seen), that
> mentions and dismisses the idea of forcing layer-by-layer
> use of something similar in extended dependency chains.
> Without an OS kernel full of prior users/code to deal
> with, this seems to minimally suffice. Especially since in
> Java, programmers are more willing to use tools that might help
> automate tiered fence placement. Yes?
My guess is that this is Section 3.5 ("Linux-Kernel Dependency Chain
Length"), the final paragraph of which reads as follows:
Again, although a great many dependency chains in the Linux
kernel are quite short, there are quite a few that spread both
widely and deeply. We therefore cannot expect Linux kernel
hackers to look fondly on any mechanism that requires them to
decorate each and every operator in each and every dependency
chain as was shown in Figure 8. In fact, even kill dependency()
will likely be an extremely difficult sell.
Given that you don't have existing code and assuming use of
fence-placement tools, layer-by-layer decoration might be OK.
The tools propagate the layer-by-layer decoration from the head
of the dependency chain or some such?
Thanx, Paul
From boehm at acm.org Fri Aug 14 18:36:55 2015
From: boehm at acm.org (Hans Boehm)
Date: Fri, 14 Aug 2015 11:36:55 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To: <55CDD61D.80602@cs.oswego.edu>
References:
<55CC8B45.50909@cs.oswego.edu>
<55CD2609.3070601@cs.oswego.edu>
<55CDD61D.80602@cs.oswego.edu>
Message-ID:
On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea wrote:
> So there are limitations in the ability of ordering control to
> improve responsiveness. Which is unsurprising given all the
> other limitations under weak scheduling guarantees. But
> that's not much of an argument for not even allowing it.
I think it's not a limitation; it's just the wrong mechanism.
The mechanism you want doesn't care about reordering of
memory visibility to other threads. That's what fences are
about. It does care about merging of memory operations and
reordering with local compute operations. Fences are not about that.
As you point out, that's kind of the domain of C-style volatiles,
though they really only address the "combining" part.
>
[Hans:]
>> But I think this doesn't have anything to do with fences.
>
>
> Ordering constraints seem intrinsic to the problem at hand.
> It's the complement of the main issue in RCU/consume:
> "really read this" vs "really write this".
I think the RCU constraint is about ordering. It's not "really read this".
CSE on two consume loads is OK for correctness, though
probably otherwise evil. And my understanding is that
you really want to order loads with respect to dependent stores
as well.
...
>
> But even if so, it seems better to have a uniform API:
> writes: full > release > storeStore
> reads: full > acquire > loadLoad
> even if loadLoad is internally mapped to acquire.
> And omitting it feels even more wrong if we support
> RCU-like usages with scoped loadLoadFence(Object ref).
C++ has a great solution to that: include neither storeStore
nor loadLoad.
As I argued in my earlier message, storeStore
really only makes sense for ordering prior writes to fields
that are subsequently treated as read-only. That's a really
narrow application domain but probably a disproportional security
bug magnet (for those people who care about Java security).
If we were to generalize the final field ordering guarantees
to other fields, I would argue that storeStore is no longer a safe
implementation of the constructor barrier. People would
"naively" expect that if they just wrote a field in a constructor,
and the object has not yet been published, it should read
back as the value they just wrote. This becomes an unsafe
assumption if storeStore is used.
My impression is that the "st" variant of "dmb" is only a win
on a smallish number of microarchitectures. Someone from
ARM should confirm or deny.
The more I think about it, the less confident I am that the use
of "dmb ishst" is actually correct as a constructor barrier to
start with. Consider:
Thread 1:
t1.f = 1; // Final field write in constructor
// Implicit constructor StoreStore fence ("dmb ishst") here.
a = t1; // Correctly publish outside constructor
Thread 2:
t2.f2 = a; // Final field write in constructor
// Implicit constructor StoreStore fence
b = t2; // Correctly publish outside constructor
Thread 3:
if (b != null && b.f2 != null) {
t2 = b.f2.f; // Guaranteed to see 1?
}
Is thread 3 guaranteed to see an initialized f? Based on the
ARM spec, I'm not sure either way. In any case, this seems
really hard to specify. Yet it's likely that most real users will
(or at least should) care. Unlike the OOTA issues where we
kind of all know what we mean, I don't think hand-waving works
here.
AFAICT, storeStore is a feature we don't know how to specify,
we're not sure is really useful in correct code, but can potentially
be used to improve performance marginally on a few
microarchitectures for one ISA. And it's likely to be a bug magnet.
That just doesn't seem like a very compelling case.
Hans
> -Doug
>
From dl at cs.oswego.edu Fri Aug 14 18:43:34 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 14 Aug 2015 14:43:34 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <20150814181720.GP3895@linux.vnet.ibm.com>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu>
<20150814153602.GI3895@linux.vnet.ibm.com>
<55CE2C15.4060706@cs.oswego.edu>
<20150814181720.GP3895@linux.vnet.ibm.com>
Message-ID: <55CE36D6.4090701@cs.oswego.edu>
On 08/14/2015 02:17 PM, Paul E. McKenney wrote:
> On Fri, Aug 14, 2015 at 01:57:41PM -0400, Doug Lea wrote:
>> My proposal to just introduce loadLoadFence(ref) (which further
>> simplifies VarHandle.getDependently(ref)) was based in part
>> on Section 3 (of the last version of N4321 I'd seen), that
>> mentions and dismisses the idea of forcing layer-by-layer
>> use of something similar in extended dependency chains.
>> Without an OS kernel full of prior users/code to deal
>> with, this seems to minimally suffice. Especially since in
>> Java, programmers are more willing to use tools that might help
>> automate tiered fence placement. Yes?
>
> Given that you don't have existing code and assuming use of
> fence-placement tools, layer-by-layer decoration might be OK.
>
> The tools propagate the layer-by-layer decoration from the head
> of the dependency chain or some such?
>
Well, no such tools exist yet. But they seem to be possible.
And there are enough performance-obsessed low-level java
programmers that I expect interest in RCU-like techniques
to grow sometime. As mentioned in my post Wednesday, these
usages are basically the same as Java final rules except that
the fences must be explicit. So in principle a tool can insert
using a variant of final-field-check rules -- not a known
variant, but I don't see any serious obstacles.
-Doug
From paulmck at linux.vnet.ibm.com Fri Aug 14 18:54:27 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 11:54:27 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55CE36D6.4090701@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
<55CA48CE.6050405@cs.oswego.edu>
<20150814153602.GI3895@linux.vnet.ibm.com>
<55CE2C15.4060706@cs.oswego.edu>
<20150814181720.GP3895@linux.vnet.ibm.com>
<55CE36D6.4090701@cs.oswego.edu>
Message-ID: <20150814185427.GQ3895@linux.vnet.ibm.com>
On Fri, Aug 14, 2015 at 02:43:34PM -0400, Doug Lea wrote:
> On 08/14/2015 02:17 PM, Paul E. McKenney wrote:
> >On Fri, Aug 14, 2015 at 01:57:41PM -0400, Doug Lea wrote:
> >>My proposal to just introduce loadLoadFence(ref) (which further
> >>simplifies VarHandle.getDependently(ref)) was based in part
> >>on Section 3 (of the last version of N4321 I'd seen), that
> >>mentions and dismisses the idea of forcing layer-by-layer
> >>use of something similar in extended dependency chains.
> >>Without an OS kernel full of prior users/code to deal
> >>with, this seems to minimally suffice. Especially since in
> >>Java, programmers are more willing to use tools that might help
> >>automate tiered fence placement. Yes?
> >
> >Given that you don't have existing code and assuming use of
> >fence-placement tools, layer-by-layer decoration might be OK.
> >
> >The tools propagate the layer-by-layer decoration from the head
> >of the dependency chain or some such?
>
> Well, no such tools exist yet. But they seem to be possible.
> And there are enough performance-obsessed low-level java
> programmers that I expect interest in RCU-like techniques
> to grow sometime. As mentioned in my post Wednesday, these
> usages are basically the same as Java final rules except that
> the fences must be explicit. So in principle a tool can insert
> using a variant of final-field-check rules -- not a known
> variant, but I don't see any serious obstacles.
And in any case, you do have the option of providing this mechanism,
and expanding upon it later when/if needed.
Thanx, Paul
From paulmck at linux.vnet.ibm.com Fri Aug 14 19:09:07 2015
From: paulmck at linux.vnet.ibm.com (Paul E. McKenney)
Date: Fri, 14 Aug 2015 12:09:07 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References:
<55CC8B45.50909@cs.oswego.edu>
<55CD2609.3070601@cs.oswego.edu>
<55CDD61D.80602@cs.oswego.edu>
Message-ID: <20150814190907.GS3895@linux.vnet.ibm.com>
On Fri, Aug 14, 2015 at 11:36:55AM -0700, Hans Boehm wrote:
> On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea wrote:
>
> > So there are limitations in the ability of ordering control to
> > improve responsiveness. Which is unsurprising given all the
> > other limitations under weak scheduling guarantees. But
> > that's not much of an argument for not even allowing it.
> I think it's not a limitation; it's just the wrong mechanism.
> The mechanism you want doesn't care about reordering of
> memory visibility to other threads. That's what fences are
> about. It does care about merging of memory operations and
> reordering with local compute operations. Fences are not about that.
> As you point out, that's kind of the domain of C-style volatiles,
> though they really only address the "combining" part.
>
> >
> [Hans:]
> >> But I think this doesn't have anything to do with fences.
> >
> >
> > Ordering constraints seem intrinsic to the problem at hand.
> > It's the complement of the main issue in RCU/consume:
> > "really read this" vs "really write this".
> I think the RCU constraint is about ordering. It's not "really read this".
> CSE on two consume loads is OK for correctness, though
> probably otherwise evil. And my understanding is that
> you really want to order loads with respect to dependent stores
> as well.
Indeed, if the code faces severe response-time constraints, its developers
and users won't thank you for common subexpression elimination on a
pair of consume loads (even assuming that the users are in a position to
express any opinion at all after the dust settles). However, it could
actually be a very good thing if the code had throughput and scalability
requirements, but no pressing response-time requirements.
As you no doubt guessed, given current C and C++ compilers, my
recommendation would be to mark respond-time-constrained consume loads
as "volatile", unpopular though that recommendation appears to be in
many quarters. ;-)
Thanx, Paul
From dl at cs.oswego.edu Sat Aug 15 00:14:51 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 14 Aug 2015 20:14:51 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu>
Message-ID: <55CE847B.7090603@cs.oswego.edu>
On 08/14/2015 02:36 PM, Hans Boehm wrote:
> On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea > wrote:
> > So there are limitations in the ability of ordering control to
> > improve responsiveness. Which is unsurprising given all the
> > other limitations under weak scheduling guarantees. But
> > that's not much of an argument for not even allowing it.
> I think it's not a limitation; it's just the wrong mechanism.
> The mechanism you want doesn't care about reordering of
> memory visibility to other threads. That's what fences are
I suppose we should stop arguing about terminology/domains but ...
Ensuring that the writes are not reordered (as seen
by other threads) seems to be a prerequisite in all the
cases under consideration here. So at least some of us
conclude that solutions should build off this (fence) constraint.
> As you point out, that's kind of the domain of C-style volatiles,
> though they really only address the "combining" part.
The use of (C) volatile casts amounts to telling the compiler:
"I cannot tell you why to write/read this variable; just
do it". If that's the best available solution, in Java, we could
add analogous VarHandle methods of the form:
void setOpaquely(Object owner, T x);
T getOpaquely(Object owner);
I don't like this at all. But to defend anyway: they do form a
sort of matched set with the reachabilityFence
implementation as "useOpaquely(ref)". And hotspot has an
analogous internal (misnamed) construct MemBarCPUOrder that means:
"I cannot tell you why not to reorder these IR nodes; just don't".
This would come into play in the implementation of these methods
as the "solution" to the lack of explicit storeStore fence.
So this is entirely workable even if entirely unappealing.
> >
> > But even if so, it seems better to have a uniform API:
> > writes: full > release > storeStore
> > reads: full > acquire > loadLoad
> > even if loadLoad is internally mapped to acquire.
> > And omitting it feels even more wrong if we support
> > RCU-like usages with scoped loadLoadFence(Object ref).
>
> C++ has a great solution to that: include neither storeStore
> nor loadLoad.
Some history: In jdk8, after a similar discussion, we added only
the three C++ fences as internal Unsafe intrinsics (which in turn
map to existing MemBar IR nodes inside hotspot). But some JVM
engineers voted with their feet about these: They introduced
internal storeStore so they could use it (and/or directly
generate) instead of release when applicable. The fact that
it is cheaper or no more expensive than alternatives on most if not
all ARM processors is surely the main reason for use. But
other use cases are not just motivated by cost.
It would be nice to avoid similar problems with jdk9 public APIs.
(This accounts for my choice of words in initial post
that it "seems wise" to include them.)
> AFAICT, storeStore is a feature we don't know how to specify,
Again, the spec is conceptually easy: Of all the ordering
effects allowed by the base memory model rules, disable
the indicated case of store-store reordering.
The hard part is specifying the conditions under which usage
does what any given programmer expects. Moreso than other fences.
But this does not seem to be a good enough reason to omit from API.
-Doug
From boehm at acm.org Sat Aug 15 01:14:32 2015
From: boehm at acm.org (Hans Boehm)
Date: Fri, 14 Aug 2015 18:14:32 -0700
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To: <55CE847B.7090603@cs.oswego.edu>
References:
<55CC8B45.50909@cs.oswego.edu>
<55CD2609.3070601@cs.oswego.edu>
<55CDD61D.80602@cs.oswego.edu>
<55CE847B.7090603@cs.oswego.edu>
Message-ID:
On Fri, Aug 14, 2015 at 5:14 PM, Doug Lea wrote:
>
> I suppose we should stop arguing about terminology/domains but ...
> Ensuring that the writes are not reordered (as seen
> by other threads) seems to be a prerequisite in all the
> cases under consideration here. So at least some of us
> conclude that solutions should build off this (fence) constraint.
You're now talking about preventing reordering of accesses to
the same variable? I do agree that such a fence could conceivably
make sense in Java, though I'm not sure whether there are
enough use cases to justify them. And I suspect you would want
a generic sameVariableFence rather than distinguishing between
loads and stores.
This doesn't make sense in C++ since, thanks largely to Paul,
memory_order_relaxed accesses are implicitly cache coherent.
I also still think this issue is quite distinct from the problem we
started with, as evidenced by the problem that merging of
variable accesses is definitely still an issue in C++, and
sameVariableFence wouldn't prevent it.
>
> On 08/14/2015 02:36 PM, Hans Boehm wrote:
>>
>> As you point out, that's kind of the domain of C-style volatiles,
>> though they really only address the "combining" part.
>
>
> The use of (C) volatile casts amounts to telling the compiler:
> "I cannot tell you why to write/read this variable; just
> do it". If that's the best available solution, in Java, we could
> add analogous VarHandle methods of the form:
> void setOpaquely(Object owner, T x);
> T getOpaquely(Object owner);
>
> I don't like this at all. But to defend anyway: they do form a
> sort of matched set with the reachabilityFence
> implementation as "useOpaquely(ref)". And hotspot has an
> analogous internal (misnamed) construct MemBarCPUOrder that means:
> "I cannot tell you why not to reorder these IR nodes; just don't".
> This would come into play in the implementation of these methods
> as the "solution" to the lack of explicit storeStore fence.
> So this is entirely workable even if entirely unappealing.
In C and C++ we could somewhat define such constructs in terms
of the behavior of asynchronous signal handlers running in the same
thread. We do that for async_signal_fence, which is essentially
intended to be a compiler fence. I don't think
there's really a Java equivalent.
I think the actual definition of "volatile" in C and C++ is entirely
unsatisfactory. The real definition has become "it does whatever
it used to do on this architecture". C has enough history in this
area that that makes sense. That's not a Java solution, though
I would prefer some kind of "compiler don't muck with this"
access to loadLoad or storeStore fences. I'd personally prefer
to spell it differently, e.g. as an @Races variable annotation.
>
>> >
>> > But even if so, it seems better to have a uniform API:
>> > writes: full > release > storeStore
>> > reads: full > acquire > loadLoad
>> > even if loadLoad is internally mapped to acquire.
>> > And omitting it feels even more wrong if we support
>> > RCU-like usages with scoped loadLoadFence(Object ref).
>>
>> C++ has a great solution to that: include neither storeStore
>> nor loadLoad.
>
>
> Some history: In jdk8, after a similar discussion, we added only
> the three C++ fences as internal Unsafe intrinsics (which in turn
> map to existing MemBar IR nodes inside hotspot). But some JVM
> engineers voted with their feet about these: They introduced
> internal storeStore so they could use it (and/or directly
> generate) instead of release when applicable. The fact that
> it is cheaper or no more expensive than alternatives on most if not
> all ARM processors is surely the main reason for use. But
> other use cases are not just motivated by cost.
Android has StoreStore internally as well, though I'm starting to
wonder whether we should get rid of it, as we did with LoadLoad.
Given the uncertainty about correct use cases, I'd certainly
prefer to keep it at most internal.
I agree that engineers voting with their feet is often a strong argument
for adding a feature. But I think those uses need to become with
convincing correctness arguments and reasonable specifications.
I think in this case we have neither. In my opinion, there have been
plenty of cases of engineers heading into dead ends as well.
>
> It would be nice to avoid similar problems with jdk9 public APIs.
> (This accounts for my choice of words in initial post
> that it "seems wise" to include them.)
>
>> AFAICT, storeStore is a feature we don't know how to specify,
>
>
> Again, the spec is conceptually easy: Of all the ordering
> effects allowed by the base memory model rules, disable
> the indicated case of store-store reordering.
Unfortunately, I think it's not that simple because of transitivity
issues. If the ordering doesn't correctly combine with the reads-from
ordering, then the expected guarantees from e.g. publication safety
go away when more than two threads are involved. And everything
is complicated since this only matters for architectures that are
not multi-copy atomic and can't be described by simple operation
reordering.
The question is whether the fact that my load saw
a store S before a StoreStore fence means anything about the visibility of S
to a third thread. I think that without some guarantee along those
lines, this is essentially useless, as in my example in the last message.
(Dependencies also figure into that example, and may be
critical, but we already know that we don't know how to handle those.)
>
> The hard part is specifying the conditions under which usage
> does what any given programmer expects. Moreso than other fences.
> But this does not seem to be a good enough reason to omit from API.
I do not know of a definition that's good enough that I would be comfortable
using them. Which I think is a good reason for omitting them.
Hans
>
> -Doug
>
From dl at cs.oswego.edu Sat Aug 15 11:47:01 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Sat, 15 Aug 2015 07:47:01 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> <55CE847B.7090603@cs.oswego.edu>
Message-ID: <55CF26B5.3050603@cs.oswego.edu>
On 08/14/2015 09:14 PM, Hans Boehm wrote:
> thanks largely to Paul,
> memory_order_relaxed accesses are implicitly cache coherent.
Right. We cannot of course require this for plain Java variables
that otherwise correspond to C++ relaxed, so this remains
among the minor C++/Java memory-model-related usage differences.
Coherence would probably not be desirable in C++ either if some
form of opaque access methods were available, but it might be
too late to change that.
It might be worth contemplating adding C++ memory_order_opaque.
rather than relying on ill-defined C-volatile casts.
Or you could just rename memory_order_consume!
The resulting usages would correspond to opaque Java VarHandle
get/set methods, matching compatibility in expressiveness.
While conceived as a desperation move in both C++ and Java,
there's a sense in which this fits naturally as a fence/mode:
They all disallow transformations on reads/writes that are
otherwise allowed by the execution model.
Finding better names and actual specs might be slightly
challenging. The sense of "opaque" here is not obvious
to potential users, but the alternatives I've thought
of (like program_order) are worse in that they could be
misleading.
-Doug
From dl at cs.oswego.edu Wed Aug 19 11:06:36 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 19 Aug 2015 07:06:36 -0400
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu>
Message-ID: <55D4633C.5070608@cs.oswego.edu>
On 08/14/2015 02:36 PM, Hans Boehm wrote:
> The more I think about it, the less confident I am that the use
> of "dmb ishst" is actually correct as a constructor barrier to
> start with.
Or, in other words, is dmb ishst guaranteed to implement
any reasonable definition of a storeStoreFence. (Which is
a separate issue than whether storeStoreFences are hard to
use.) I had not considered this possibility, but we do need
to be sure that it applies as expected in cases like:
>
> Thread 1:
> t1.f = 1; // Final field write in constructor
> // Implicit constructor StoreStore fence ("dmb ishst") here.
> a = t1; // Correctly publish outside constructor
>
> Thread 2:
> t2.f2 = a; // Final field write in constructor
> // Implicit constructor StoreStore fence
> b = t2; // Correctly publish outside constructor
>
> Thread 3:
> if (b != null && b.f2 != null) {
> t2 = b.f2.f; // Guaranteed to see 1?
> }
>
The ARM specs are not completely clear about this, so we've asked
them (Stephan Diestelhorst et al) to clarify. If it is the case
that the ARM spec is not intended to require it but all current
processors have the desired effect, the (disappointing) best course
of action seems to be to leave it as non-public within-JDK intrinsic
but not to expose as a Fence.
-Doug
From dl at cs.oswego.edu Wed Aug 19 11:44:58 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 19 Aug 2015 07:44:58 -0400
Subject: [jmm-dev] jdk9 APIs (CAS)
In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
Message-ID: <55D46C3A.1050509@cs.oswego.edu>
Thanks to Cliff Click for prodding me to realize that the signatures
for new analogs of C/C++ moded CAS should, like those versions, return
values not booleans (i.e, on failure, the value causing CAS to fail,
else the given value). And renamed to compareAndExchange to better
convey this. (Only 6 of the possible mode X weakness cases make sense,)
Any objections?
The current tentative VarHandles API looks like:
abstract class NotReallyVarHandle {
// Load
T getRelaxed(Object owner);
T getAcquire(Object owner);
T getVolatile(Object owner);
T getOpaque((Object owner);
// Store
void setRelaxed(Object owner, T val);
void setRelease(Object owner, T val);
void setVolatile(Object owner, T val);
void setOpaque(Object owner, T val);
// CAS
boolean compareAndSet(Object owner, T cmp, T val);
// Value-returning compare and exchange
T compareAndExchangeVolatile(Object owner, T cmp, T val);
T compareAndExchangeAcquire(Object owner, T cmp, T val);
T compareAndExchangeRelease(Object owner, T cmp, T val);
T weakCompareAndExchangeRelaxed(Object owner, T cmp, T val);
T weakCompareAndExchangeAcquire(Object owner, T cmp, T val);
T weakCompareAndExchangeRelease(Object owner, T cmp, T val);
// special RMW
T getAndSet(Object owner, T val);
T getAndAdd(Object owner, T delta);
T addAndGet(Object owner, T delta);
}
From stephan.diestelhorst at arm.com Wed Aug 19 12:01:07 2015
From: stephan.diestelhorst at arm.com (Stephan Diestelhorst)
Date: Wed, 19 Aug 2015 13:01:07 +0100
Subject: [jmm-dev] jdk9 APIs [Fences specifically]
In-Reply-To:
References:
<55CDD61D.80602@cs.oswego.edu>
Message-ID: <1918832.yVbzCkb1mQ@mymac-ubuntu>
On Friday 14 August 2015 19:36:55 Hans Boehm wrote:
> As I argued in my earlier message, storeStore
> really only makes sense for ordering prior writes to fields
> that are subsequently treated as read-only. That's a really
> narrow application domain but probably a disproportional security
> bug magnet (for those people who care about Java security).
>
> If we were to generalize the final field ordering guarantees
> to other fields, I would argue that storeStore is no longer a safe
> implementation of the constructor barrier. People would
> "naively" expect that if they just wrote a field in a constructor,
> and the object has not yet been published, it should read
> back as the value they just wrote. This becomes an unsafe
> assumption if storeStore is used.
>
> My impression is that the "st" variant of "dmb" is only a win
> on a smallish number of microarchitectures. Someone from
> ARM should confirm or deny.
>
> The more I think about it, the less confident I am that the use
> of "dmb ishst" is actually correct as a constructor barrier to
> start with. Consider:
>
> Thread 1:
> t1.f = 1; // Final field write in constructor
> // Implicit constructor StoreStore fence ("dmb ishst") here.
> a = t1; // Correctly publish outside constructor
>
> Thread 2:
> t2.f2 = a; // Final field write in constructor
> // Implicit constructor StoreStore fence
> b = t2; // Correctly publish outside constructor
>
> Thread 3:
> if (b != null && b.f2 != null) {
> t2 = b.f2.f; // Guaranteed to see 1?
> }
>
> Is thread 3 guaranteed to see an initialized f? Based on the
> ARM spec, I'm not sure either way. In any case, this seems
> really hard to specify. Yet it's likely that most real users will
> (or at least should) care. Unlike the OOTA issues where we
> kind of all know what we mean, I don't think hand-waving works
> here.
We have looked at the example above and if you look at the definition of
our StoreStore barrier, you will find that it is somewhat involved (and
not just core-local!). With that definition under the belt, the example
is indeed guaranteed to see the initialised value here.
The question on the performance side with the stronger StoreStore
barrier is one that we cannot answer publicly, yet.
> AFAICT, storeStore is a feature we don't know how to specify,
> we're not sure is really useful in correct code,
We have a specification out there for a strong-ish StoreStore barrier
(at least judged by this example), which effectively not just orders
thread-local stores, but also includes order with stores on other
CPUs that are "logically" after the StoreStore barrier (through an edge
from a load that reads things from after the barrier). That way, these
logically dependent stores become globally ordered with respect to the
stores before the barrier -- reducing the impact of the absent
multi-copy atomicity.
> but can potentially be used to improve performance marginally on a few
> microarchitectures for one ISA.
These barriers may indeed be more expensive than the thread-local push
the stores in-order barriers. However, those are complicated to reason
about without any additional multi-copy atomicity.
--
Sincerely,
Stephan
Stephan Diestelhorst
ARM Research - Systems
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
From dl at cs.oswego.edu Thu Aug 20 12:27:49 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 20 Aug 2015 08:27:49 -0400
Subject: [jmm-dev] jdk9 APIs (CAS)
In-Reply-To: <55D46C3A.1050509@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D46C3A.1050509@cs.oswego.edu>
Message-ID: <55D5C7C5.8090308@cs.oswego.edu>
On 08/19/2015 07:44 AM, Doug Lea wrote:
>
> Thanks to Cliff Click for prodding me to realize that the signatures
> for new analogs of C/C++ moded CAS should, like those versions, return
> values not booleans (i.e, on failure, the value causing CAS to fail,
> else the given value).
And thanks to Andrew Haley for noticing that this is not desirable
for the "weak" versions that can spuriously fail. The C++ versions
both report boolean and set one of the arguments by-ref to hold
value, which we can't do. So I reverted these three (but renamed
weakCompareAndSet to weakCompareAndSetRelaxed for clarity).
-Doug
From dl at cs.oswego.edu Thu Aug 20 13:44:07 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 20 Aug 2015 09:44:07 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu>
Message-ID: <55D5D9A7.4050704@cs.oswego.edu>
On 08/10/2015 02:22 PM, Doug Lea wrote:
> There seems to be only a little room for discussion on the exact sets
> of methods in the VarHandle* and Fences classes:
I think these are as settled as they need to be for present purposes.
Perhaps not loved, but settled. Please let me know if you think otherwise.
Current versions are pasted below and updated at
http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java
Bear in mind that the actual VarHandles class differs due to polymorphic
method-handle mechanics. Also it is possible that the methods in "Fences"
might end up elsewhere (maybe in the VarHandles utility class) that better
fit with Un-Unsafe-ifying (safication?) plans.
The javadocs still need some work, but the functionality
seems to meet the main goals, mainly of matching C/C++11 atomics
in all the "important" cases, providing some way to obtain
equivalent effects in remaining cases, and exposing all these
in public APIs (vs Unsafe or other weird hacks). And doing so for now
without revising the base memory model (thus relying on non-formal
specs), but with the belief that they are compatible with future
JMM updates.
As always, and other comments and suggestions are also welcome.
...
/**
* Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle
*/
abstract class NotReallyVarHandle {
// Load
/**
* Returns the value, with memory semantics of reading a
* non-volatile variable.
*/
T getRelaxed(Object owner);
/**
* Returns the value, with memory semantics of reading a volatile
* variable.
*/
T getVolatile(Object owner);
/**
* Returns the value, and ensures that subsequent loads and stores
* are not reordered before this access.
*
* @apiNote Ignoring the many semantic differences from C and
* C++, this method has memory ordering effects compatible with
* memory_order_acquire ordering.
*/
T getAcquire(Object owner);
/**
* Returns the value, accessed in program order, but with no
* assurance of memory ordering effects with respect to other
* threads.
*/
T getOpaque(Object owner);
// Store
/**
* Sets the value, with memory semantics of setting a non-volatile
* variable.
*
* @param val the new value
*/
void setRelaxed(Object owner, T val);
/**
* Sets the value, and ensures that prior loads and stores are not
* reordered after this access.
*
* @param val the new value
*/
void setVolatile(Object owner, T val);
/**
* Sets the value, with memory semantics of setting a volatile
* variable.
*
* @apiNote Ignoring the many semantic differences from C and
* C++, this method has memory ordering effects compatible with
* memory_order_release ordering.
*
* @param val the new value
*/
void setRelease(Object owner, T val);
/**
* Sets the value, in program order, but with no assurance of
* memory ordering effects with respect to other threads.
*
* @param val the new value
*/
void setOpaque(Object owner, T val);
// CAS
/**
* Atomically sets the value to the given updated value with the
* memory semantics of setVolatile if the current value {@code ==}
* the expected value, as accessed with the memory semantics of
* getVolatile.
*
* @param expected the expected value
* @param val the new value
* @return {@code true} if successful. False return indicates that
* the actual value was not equal to the expected value.
*/
boolean compareAndSet(Object owner, T expected, T val);
// Value-returning compare and exchange
/**
* Atomically sets the value to the given updated value with the
* memory semantics of setVolatile if the current value {@code ==}
* the expected value, as accessed with the memory semantics of
* getVolatile.
*
* @param expected the expected value
* @param val the new value
* @return the current value, which will be the same as {@code val} if
* successful.
*/
T compareAndExchangeVolatile(Object owner, T expected, T val);
/**
* Atomically sets the value to the given updated value with the
* memory semantics of setRelaxed if the current value {@code ==}
* the expected value, as accessed with the memory semantics of
* getAcquire.
*
* @param expected the expected value
* @param val the new value
* @return the current value, which will be the same as {@code val} if
* successful.
*/
T compareAndExchangeAcquire(Object owner, T expected, T val);
/**
* Atomically sets the value to the given updated value with the
* memory semantics of setRelease if the current value {@code ==}
* the expected value, as accessed with the memory samantics of
* getRelaxed.
*
* @param expected the expected value
* @param val the new value
* @return the current value, which will be the same as {@code val} if
* successful.
*/
T compareAndExchangeRelease(Object owner, T expected, T val);
// Weak (spurious failures allowed)
/**
* Possibly atomically sets the value to the given updated value
* with the semantics of setRelaxed if the current value {@code
* ==} the expected value, as as accessed with the memory
* semantics of getRelaxed. This operation may fail spuriously
* (typically, due to memory contention) even if the current value
* does match the expected value.
*
* @param expected the expected value
* @param val the new value
* @return {@code true} if successful
*/
boolean weakCompareAndSetRelaxed(Object owner, T expected, T val);
/**
* Possibly atomically sets the value to the given updated value
* with the memory semantics of setRelaxed if the current value
* {@code ==} the expected value, as as accessed with the memory
* semantics of getAcquire. This operation may fail spuriously
* (typically, due to memory contention) even if the current value
* does match the expected value.
*
* @param expected the expected value
* @param val the new value
* @return {@code true} if successful
*/
boolean weakCompareAndSetAcquire(Object owner, T expected, T val);
/**
* Possibly atomically sets the value to the given updated value
* with the memory semantics of setRelease if the current value
* {@code ==} the expected value, as as accessed with the memory
* semantics of getRelaxed. This operation may fail spuriously
* (typically, due to memory contention) even if the current value
* does match the expected value.
*
* @param expected the expected value
* @param val the new value
* @return {@code true} if successful
*/
boolean weakCompareAndSetRelease(Object owner, T expected, T val);
// special RMW
/**
* Atomically sets to the given value with the memory semantics of
* setVolatile and returns the old value.
*
* @param newValue the new value
* @return the previous value
*/
T getAndSet(Object owner, T val);
/**
* Atomically adds the given value to the current value with the
* memory semantics of setVolatile.
*
* @param delta the value to add
* @return the previous value
*/
T getAndAdd(Object owner, T delta);
/**
* Atomically adds the given value to the current value with the
* memory semantics of setVolatile.
*
* @param delta the value to add
* @return the current value
*/
T addAndGet(Object owner, T delta);
}
/**
* A set of methods providing fine-grained control of memory ordering.
*
* The Java Language Specification permits operations to be
* executed in orders different than are apparent in program source
* code, subject to constraints mainly arising from the use of locks
* and volatile fields. The methods of this class can also be used to
* impose constraints. Their specifications are phrased in terms of
* the lack of "reorderings" -- observable ordering effects that might
* otherwise occur if the fence were not present.
*
* @apiNote More precise phrasing of these specifications may
* accompany future updates of the Java Language Specification.
*/
public class Fences {
/**
* Ensures that loads and stores before the fence will not be
* reordered with loads and stores after the fence.
*
* @apiNote Ignoring the many semantic differences from C and
* C++, this method has memory ordering effects compatible with
* atomic_thread_fence(memory_order_seq_cst)
*/
public static void fullFence() {}
/**
* Ensures that loads before the fence will not be reordered with
* loads and stores after the fence.
*
* @apiNote Ignoring the many semantic differences from C and
* C++, this method has memory ordering effects compatible with
* atomic_thread_fence(memory_order_acquire)
*/
public static void acquireFence() {}
/**
* Ensures that loads and stores before the fence will not be
* reordered with stores after the fence.
*
* @apiNote Ignoring the many semantic differences from C and
* C++, this method has memory ordering effects compatible with
* atomic_thread_fence(memory_order_release)
*/
public static void releaseFence() {}
/**
* Ensures that loads before the fence will not be reordered with
* loads after the fence.
*/
public static void loadLoadFence() {}
/**
* Ensures that stores before the fence will not be reordered with
* stores after the fence.
*/
public static void storeStoreFence() {}
}
class java.lang.ref.Reference {
// add:
/**
* Ensures that the object referenced by the given reference
* remains strongly reachable (as defined in the {@link
* java.lang.ref} package documentation), regardless of any prior
* actions of the program that might otherwise cause the object to
* become unreachable; thus, the referenced object is not
* reclaimable by garbage collection at least until after the
* invocation of this method. Invocation of this method does not
* itself initiate garbage collection or finalization.
*
* @param ref the reference. If null, this method has no effect.
*/
public static void reachabilityFence(Object ref) {}
}
From adinn at redhat.com Thu Aug 20 13:59:39 2015
From: adinn at redhat.com (Andrew Dinn)
Date: Thu, 20 Aug 2015 14:59:39 +0100
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55D5D9A7.4050704@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
Message-ID: <55D5DD4B.6040108@redhat.com>
On 20/08/15 14:44, Doug Lea wrote:
. . .
> /**
> * Atomically sets the value to the given updated value with the
> * memory semantics of setRelease if the current value {@code ==}
> * the expected value, as accessed with the memory samantics of
> * getRelaxed.
^^^^^^^^^^
> * @param expected the expected value
> * @param val the new value
> * @return the current value, which will be the same as {@code val} if
> * successful.
> */
> T compareAndExchangeRelease(Object owner, T expected, T val);
Should that be "with the memory samantics of getRelease"?
regards,
Andrew Dinn
-----------
From dl at cs.oswego.edu Thu Aug 20 14:16:02 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 20 Aug 2015 10:16:02 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55D5DD4B.6040108@redhat.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D5DD4B.6040108@redhat.com>
Message-ID: <55D5E122.8040706@cs.oswego.edu>
On 08/20/2015 09:59 AM, Andrew Dinn wrote:
> On 20/08/15 14:44, Doug Lea wrote:
> . . .
>> /**
>> * Atomically sets the value to the given updated value with the
>> * memory semantics of setRelease if the current value {@code ==}
>> * the expected value, as accessed with the memory samantics of
>> * getRelaxed.
> ^^^^^^^^^^
>> * @param expected the expected value
>> * @param val the new value
>> * @return the current value, which will be the same as {@code val} if
>> * successful.
>> */
>> T compareAndExchangeRelease(Object owner, T expected, T val);
>
> Should that be "with the memory samantics of getRelease"?
>
Yes, thanks!
-Doug
From aleksey.shipilev at oracle.com Fri Aug 21 13:54:11 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Fri, 21 Aug 2015 16:54:11 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55D5D9A7.4050704@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
Message-ID: <55D72D83.5020407@oracle.com>
On 08/20/2015 04:44 PM, Doug Lea wrote:
> On 08/10/2015 02:22 PM, Doug Lea wrote:
> T getOpaque(Object owner);
> void setOpaque(Object owner, T val);
My immediate reaction as the guy who wasn't participating in the
original discussion: "Er, I don't like these at all". So, these seem
similar to "relaxed", but only they are "hardware-relaxed", not
"compiler-relaxed".
Doesn't this depart from having a memory model covering both software
and hardware behaviors? Doesn't that hurt cross-platform compatibility?
Inviting users to use the public APIs that perform functionally
differently on different hardware is silently leading them into the
platform lock-in. An explanation "do whatever hardware does", while
looking more appealing, means the same.
It is also seems that (get|set)Opaque behavioral difference can only be
observed under race, but AFAIU that can be differentiated against the
unlucky timing only when some other causal non-synchronization
relationship holds (e.g. during object publication).
So, if that is only to help the publication/consume cases, should we
just rely on already implemented guarantees for final fields (and
possibly revisit the idea of making all initialization stores "final",
and then emit the barrier implicitly -- thus covering the corner case of
volatile fields initialized in objects), rendering the producer-consumer
problems a moot for a while?
> T compareAndExchangeVolatile(Object owner, T expected, T val);
> T compareAndExchangeAcquire(Object owner, T expected, T val);
> T compareAndExchangeRelease(Object owner, T expected, T val);
Um, I have a stupid question. What's the story with non-x86 processors
here? Do we have any machines that do not have an explicit CAS, but only
have LL/SC? Returning the failure witness does not seem doable with
LL/SC-emulated CAS.
Thanks,
-Aleksey
From dl at cs.oswego.edu Fri Aug 21 17:54:43 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Fri, 21 Aug 2015 13:54:43 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55D72D83.5020407@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D72D83.5020407@oracle.com>
Message-ID: <55D765E3.4070704@cs.oswego.edu>
On 08/21/2015 09:54 AM, Aleksey Shipilev wrote:
> On 08/20/2015 04:44 PM, Doug Lea wrote:
>> On 08/10/2015 02:22 PM, Doug Lea wrote:
>> T getOpaque(Object owner);
>> void setOpaque(Object owner, T val);
>
> My immediate reaction as the guy who wasn't participating in the
> original discussion: "Er, I don't like these at all".
With luck, this will be the universal reaction.
First you hate it, then you discover uncommon cases
in which you need it, then you begrudgingly accept it.
Maybe we can make a video.
For most people, the only thing to know about opaque is that
it fulfills the same role as the C cast-to-volatile escape
hatch, which is also justifiably unloved, but still used
for example in the Linux kernel.
> So, these seem
> similar to "relaxed", but only they are "hardware-relaxed", not
> "compiler-relaxed".
They basically say: even if other JMM rules would allow the read or
write to be optimized away, don't. They feel like compiler-only
directives only because hardware never elides unbounded numbers
of reads/writes.
>
> It is also seems that (get|set)Opaque behavioral difference can only be
> observed under race,
Yes but use of VarHandle methods implicitly indicates "might be racy".
(So we don't need a separate annotation for this).
>
> So, if that is only to help the publication/consume cases, should we
> just rely on already implemented guarantees...
There is no sign that the series of special-cases will ever end.
We should try to understand and support explicit transformation
constraints when possible. But so far the record is not all
that great. So the only good pragmatic response I know is to
include opaque.
>> T compareAndExchangeVolatile(Object owner, T expected, T val);
>> T compareAndExchangeAcquire(Object owner, T expected, T val);
>> T compareAndExchangeRelease(Object owner, T expected, T val);
>
> Um, I have a stupid question. What's the story with non-x86 processors
> here? Do we have any machines that do not have an explicit CAS, but only
> have LL/SC? Returning the failure witness does not seem doable with
> LL/SC-emulated CAS.
With LL/SC, you loop until either compare-failure with the LL'ed value
or success of the SC. So either way there is a unique answer.
-Doug
From aph at redhat.com Sat Aug 22 08:43:32 2015
From: aph at redhat.com (Andrew Haley)
Date: Sat, 22 Aug 2015 09:43:32 +0100
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55D72D83.5020407@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D72D83.5020407@oracle.com>
Message-ID: <55D83634.6070303@redhat.com>
On 08/21/2015 02:54 PM, Aleksey Shipilev wrote:
> Um, I have a stupid question. What's the story with non-x86 processors
> here? Do we have any machines that do not have an explicit CAS, but only
> have LL/SC?
Yes, we do.
> Returning the failure witness does not seem doable with
> LL/SC-emulated CAS.
In fact it's the other way around: it's difficult (or, for all
practical purposes, impossible) to emulate LL/SC with CAS, but easy to
emulate CAS with LL/SC.
Andrew.
From aleksey.shipilev at oracle.com Mon Aug 24 08:42:00 2015
From: aleksey.shipilev at oracle.com (Aleksey Shipilev)
Date: Mon, 24 Aug 2015 11:42:00 +0300
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55D765E3.4070704@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu>
Message-ID: <55DAD8D8.8070508@oracle.com>
On 08/21/2015 08:54 PM, Doug Lea wrote:
> On 08/21/2015 09:54 AM, Aleksey Shipilev wrote:
>>> T compareAndExchangeVolatile(Object owner, T expected, T val);
>>> T compareAndExchangeAcquire(Object owner, T expected, T val);
>>> T compareAndExchangeRelease(Object owner, T expected, T val);
>>
>> Um, I have a stupid question. What's the story with non-x86 processors
>> here? Do we have any machines that do not have an explicit CAS, but only
>> have LL/SC? Returning the failure witness does not seem doable with
>> LL/SC-emulated CAS.
>
> With LL/SC, you loop until either compare-failure with the LL'ed value
> or success of the SC. So either way there is a unique answer.
Oh, right. I was under (false) impression that you do LL/SC without
looping, but I guess that is not really viable for strong CAS emulation
since SC may "spuriously" fail.
Thanks,
-Aleksey
From boehm at acm.org Tue Aug 25 00:01:25 2015
From: boehm at acm.org (Hans Boehm)
Date: Mon, 24 Aug 2015 17:01:25 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55DAD8D8.8070508@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu>
<55DAD8D8.8070508@oracle.com>
Message-ID:
Some more comments:
I think the getAnd() operations also need to specify that the load part is
volatile, so that an earlier volatile store synchronizes with the call.
The "reordering"-based specifications are extremely problematic. They give
no hint that, for example, IRIW can produce non-SC results with LoadLoad or
Acquire fences. But I think that has to be allowed for anything other than
fullFence() to be useful.
Note that the use of "Relaxed" here is seriously weaker than in C & C++.
It doesn't promise cache-coherence. That's a sufficiently serious
difference that I'm not sure I would use the same word. For example, a
field that is only ever atomically incremented can appear to decrease,
where it can't in C or C++. Of course that term already has a long history
of not-very-consistent use in the hardware community, so maybe it's OK.
Is it intended to be possible to apply these methods to built-in types like
long? If so, does Relaxed provide word-tearing guarantees?
As you've probably gathered, I'm not volunteering to define semantics of
loadLoadFence() or storeStoreFence().
reachabilityFence:
"regardless of any prior actions of the program that might otherwise cause
the object to become unreachable" doesn't sound right to me. It really
revokes the compilers license to "misoptimize" the program normally given
by 12.6.1:
"Optimizing transformations of a program can be designed that reduce the
number of objects that are reachable to be less than those which would
naively be considered reachable."
The real guarantee seems to be that reachabilityFence(ref) happens before
ref is finalized or added to a java.lang.ref queue. Can we include that in
its definition?
This would normally mean that (1) either reachabilityFence(ref) or the next
safe point (or something in between) acts as a release operation for
compiler purposes, and (2) the next safe point synchronizes with any GC
that might reclaim ref. I can't think of a way to write a correct GC that
doesn't make (2) or some analogous condition true automatically. But I
think we need to be clear that (1) is expected.
Presumably it's too late to add an annotation? The simplest would probably
be an @FinalizationSensitive class annotation that causes the compiler to
insert a reachabilityFence at the end of the scope of every reference with
that static type or a subclass type. (Or after the current statement for
temporaries.)
ReachabilityFence() would help us for library code, but I doubt it's going
to get much use in user-written code. Especially the temporary case is
likely to be missed. You need to transform
nativeCall(foo.getFinalizableObj().nativePtr) to
T tmp = foo.getFinalizableObj();
nativeCall(tmp.nativePtr);
reachabilityFence(tmp);
Hans
On Mon, Aug 24, 2015 at 1:42 AM, Aleksey Shipilev <
aleksey.shipilev at oracle.com> wrote:
> On 08/21/2015 08:54 PM, Doug Lea wrote:
> > On 08/21/2015 09:54 AM, Aleksey Shipilev wrote:
> >>> T compareAndExchangeVolatile(Object owner, T expected, T val);
> >>> T compareAndExchangeAcquire(Object owner, T expected, T val);
> >>> T compareAndExchangeRelease(Object owner, T expected, T val);
> >>
> >> Um, I have a stupid question. What's the story with non-x86 processors
> >> here? Do we have any machines that do not have an explicit CAS, but only
> >> have LL/SC? Returning the failure witness does not seem doable with
> >> LL/SC-emulated CAS.
> >
> > With LL/SC, you loop until either compare-failure with the LL'ed value
> > or success of the SC. So either way there is a unique answer.
>
> Oh, right. I was under (false) impression that you do LL/SC without
> looping, but I guess that is not really viable for strong CAS emulation
> since SC may "spuriously" fail.
>
> Thanks,
> -Aleksey
>
>
>
From aph at redhat.com Tue Aug 25 08:45:05 2015
From: aph at redhat.com (Andrew Haley)
Date: Tue, 25 Aug 2015 09:45:05 +0100
Subject: [jmm-dev] jdk9 APIs
In-Reply-To:
References: <55C8EBCD.7010006@cs.oswego.edu>
<55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com>
<55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com>
Message-ID: <55DC2B11.7020509@redhat.com>
On 08/25/2015 01:01 AM, Hans Boehm wrote:
> Note that the use of "Relaxed" here is seriously weaker than in C &
> C++. It doesn't promise cache-coherence. That's a sufficiently
> serious difference that I'm not sure I would use the same word. For
> example, a field that is only ever atomically incremented can appear
> to decrease, where it can't in C or C++. Of course that term
> already has a long history of not-very-consistent use in the
> hardware community, so maybe it's OK.
I don't think it's OK. This is so madly counter-intuitive that we
should forbid non-coherent updates. And I can think of no reason we
should be weaker than C and C++; the last time I heard we were trying
to be closer. Are there any non-coherent machines we care about?
> Presumably it's too late to add an annotation? The simplest would
> probably be an @FinalizationSensitive class annotation that causes
> the compiler to insert a reachabilityFence at the end of the scope
> of every reference with that static type or a subclass type. (Or
> after the current statement for temporaries.)
I think that's a great idea, although I recall we discussed it before
and rejected it. I don't know why.
I'm wary of "annotations" with such significant semantics, though. As
a language feature it's a bit, er, yucky; surely it should be a part
of the declaration. But an annotation is reliable and robust, and
much more so than a method you'd have to remember to invoke
everywhere; such a thing would be a maintenance problem.
Having said that, there is no reason we should not add
reachabilityFence today and plan for the annotation to be a later
addition.
Andrew.
From stephan.diestelhorst at arm.com Tue Aug 25 09:53:56 2015
From: stephan.diestelhorst at arm.com (Stephan Diestelhorst)
Date: Tue, 25 Aug 2015 10:53:56 +0100
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55DAD8D8.8070508@oracle.com>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D765E3.4070704@cs.oswego.edu>
<55DAD8D8.8070508@oracle.com>
Message-ID: <1939744.JyTAqgXLQL@mymac-ubuntu>
On Monday 24 August 2015 09:42:00 Aleksey Shipilev wrote:
> On 08/21/2015 08:54 PM, Doug Lea wrote:
>
> > On 08/21/2015 09:54 AM, Aleksey Shipilev wrote:
> >
> >>> T compareAndExchangeVolatile(Object owner, T expected, T val);
> >>> T compareAndExchangeAcquire(Object owner, T expected, T val);
> >>> T compareAndExchangeRelease(Object owner, T expected, T val);
> >>
> >>
> >>
> >> Um, I have a stupid question. What's the story with non-x86 processors
> >> here? Do we have any machines that do not have an explicit CAS, but only
> >> have LL/SC? Returning the failure witness does not seem doable with
> >> LL/SC-emulated CAS.
> >
> >
> > With LL/SC, you loop until either compare-failure with the LL'ed value
> > or success of the SC. So either way there is a unique answer.
>
>
> Oh, right. I was under (false) impression that you do LL/SC without
> looping, but I guess that is not really viable for strong CAS emulation
> since SC may "spuriously" fail.
Note that, despite the loop, we guarantee lock-freedom:
"In the event of repeatedly-contending Load-Exclusive/Store-Exclusive
instruction sequences from multiple PEs, an implementation must ensure
that forward progress is made by at least one PE." ARM ARM v8
--
Sincerely,
Stephan
Stephan Diestelhorst
Staff Engineer,
ARM Research - Systems
+44 (0)1223 405662
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
From dl at cs.oswego.edu Tue Aug 25 13:22:51 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Tue, 25 Aug 2015 09:22:51 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To:
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com>
Message-ID: <55DC6C2B.3030807@cs.oswego.edu>
On 08/24/2015 08:01 PM, Hans Boehm wrote:
> Some more comments:
Thanks!
>
> I think the getAnd() operations also need to specify that the load part is
> volatile, so that an earlier volatile store synchronizes with the call.
Thanks; fixed.
>
> The "reordering"-based specifications are extremely problematic. They give no
> hint that, for example, IRIW can produce non-SC results with LoadLoad or Acquire
> fences. But I think that has to be allowed for anything other than fullFence()
> to be useful.
Yes. It is hard to be clear without accidentally revising the JMM.
We once (in previous efforts) came up with some wording to help
explain. I'll look for it and try to adapt.
>
> Note that the use of "Relaxed" here is seriously weaker than in C & C++. It
> doesn't promise cache-coherence. That's a sufficiently serious difference that
> I'm not sure I would use the same word.
And plain (non-volatile, non-final) Java vars are also not exactly the same
as in C/C++. I and others have gone back and forth on using "getPlain" (etc)
vs "getRelaxed" here. Using getPlain sounds odder but might be a better choice.
I suppose we could even have both getPlain and getRelaxed, but it seems
better just to add an explanation somewhere about how to obtain coherence
if you need it.
>
> Is it intended to be possible to apply these methods to built-in types like
> long? If so, does Relaxed provide word-tearing guarantees?
Yes. This should be indicated in class-level docs that don't
currently exist because NotReallyVarHandle doesn't really exist :-)
We had discussed and seemed headed to removing this disclaimer
for all Java primitive vars, but for now we can at least do so for
those accessed with VarHandles.
> reachabilityFence:
>
> "regardless of any prior actions of the program that might otherwise cause the
> object to become unreachable" doesn't sound right to me. It really revokes the
> compilers license to "misoptimize" the program normally given by 12.6.1:
>
> "Optimizing transformations of a program can be designed that reduce the number
> of objects that are reachable to be less than those which would naively be
> considered reachable."
>
> The real guarantee seems to be that reachabilityFence(ref) happens before ref is
> finalized or added to a java.lang.ref queue. Can we include that in its definition?
>
Will do; thanks,
>
> Presumably it's too late to add an annotation?
Not too late, but a separable effort that I'd like to encourage people
to undertake. We laid the groundwork on this list last August.
Carrying it out seems to need involvement of compiler and IDE developers.
-Doug
From boehm at acm.org Tue Aug 25 17:09:43 2015
From: boehm at acm.org (Hans Boehm)
Date: Tue, 25 Aug 2015 10:09:43 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55DC6C2B.3030807@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu>
<55DAD8D8.8070508@oracle.com>
<55DC6C2B.3030807@cs.oswego.edu>
Message-ID:
Thanks!
A couple of further responses:
On Tue, Aug 25, 2015 at 6:22 AM, Doug Lea wrote:
> On 08/24/2015 08:01 PM, Hans Boehm wrote:
>
>> ...
>
> Note that the use of "Relaxed" here is seriously weaker than in C & C++.
>> It
>> doesn't promise cache-coherence. That's a sufficiently serious
>> difference that
>> I'm not sure I would use the same word.
>>
>
> And plain (non-volatile, non-final) Java vars are also not exactly the same
> as in C/C++. I and others have gone back and forth on using "getPlain"
> (etc)
> vs "getRelaxed" here. Using getPlain sounds odder but might be a better
> choice.
>
> I suppose we could even have both getPlain and getRelaxed, but it seems
> better just to add an explanation somewhere about how to obtain coherence
> if you need it.
I think that's hard to do portably, right? Acquire/release should work
since they
are defined analogously to C++, which requires coherence. But they're
clearly
substantial overkill, e.g. on Power or ARMv8 (or v7 without the erratum).
>
>> Is it intended to be possible to apply these methods to built-in types
>> like
>> long? If so, does Relaxed provide word-tearing guarantees?
>>
>
> Yes. This should be indicated in class-level docs that don't
> currently exist because NotReallyVarHandle doesn't really exist :-)
> We had discussed and seemed headed to removing this disclaimer
> for all Java primitive vars, but for now we can at least do so for
> those accessed with VarHandles.
I now think that earlier discussion was premature, but it makes sense
to remove the disclaimer here. Removing the disclaimer in general is
expensive
and probably not feasible on 32-bit MIPS and some older ARM processors.
Hans
From dl at cs.oswego.edu Wed Aug 26 12:01:34 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Wed, 26 Aug 2015 08:01:34 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To:
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> <55DC6C2B.3030807@cs.oswego.edu>
Message-ID: <55DDAA9E.3080407@cs.oswego.edu>
On 08/25/2015 01:09 PM, Hans Boehm wrote:
> On Tue, Aug 25, 2015 at 6:22 AM, Doug Lea > wrote:
> And plain (non-volatile, non-final) Java vars are also not exactly the same
> as in C/C++. I and others have gone back and forth on using "getPlain" (etc)
> vs "getRelaxed" here. Using getPlain sounds odder but might be a better choice.
>
I'm still torn between odd-soundingness vs reducing chances of confusion.
Does anyone else have an opinion?
> I suppose we could even have both getPlain and getRelaxed, but it seems
> better just to add an explanation somewhere about how to obtain coherence
> if you need it.
>
> I think that's hard to do portably, right? Acquire/release should work since they
> are defined analogously to C++, which requires coherence. But they're clearly
> substantial overkill, e.g. on Power or ARMv8 (or v7 without the erratum).
These seem to be variants of the issues in our write-combining discussion
last week. And maybe not even variants: Of the cases I can recall
being discussed wrt per-variable coherence, I'm not sure which of
the choices of setOpaque, setRelease, releaseFence, or storeStoreFence
(with symmetric choices on the load side) people had in mind.
> Yes. This should be indicated in class-level docs that don't
> currently exist because NotReallyVarHandle doesn't really exist :-)
> We had discussed and seemed headed to removing this disclaimer
> for all Java primitive vars, but for now we can at least do so for
> those accessed with VarHandles.
>
> I now think that earlier discussion was premature, but it makes sense
> to remove the disclaimer here. Removing the disclaimer in general is expensive
> and probably not feasible on 32-bit MIPS and some older ARM processors.
>
OK. In fact, this (banning word-tearing only via VH) might not a bad move
even when we do revise JMM/JLS.
-Doug
From dl at cs.oswego.edu Thu Aug 27 11:30:52 2015
From: dl at cs.oswego.edu (Doug Lea)
Date: Thu, 27 Aug 2015 07:30:52 -0400
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55DDAA9E.3080407@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> <55DC6C2B.3030807@cs.oswego.edu>
<55DDAA9E.3080407@cs.oswego.edu>
Message-ID: <55DEF4EC.7090608@cs.oswego.edu>
On 08/26/2015 08:01 AM, Doug Lea wrote:
> On 08/25/2015 01:09 PM, Hans Boehm wrote:
>> On Tue, Aug 25, 2015 at 6:22 AM, Doug Lea > > wrote:
>> And plain (non-volatile, non-final) Java vars are also not exactly the same
>> as in C/C++. I and others have gone back and forth on using "getPlain" (etc)
>> vs "getRelaxed" here. Using getPlain sounds odder but might be a better
>> choice.
>
> I'm still torn between odd-soundingness vs reducing chances of confusion.
> Does anyone else have an opinion?
Or, if you can't say anything nice, don't say anything at all.
Just calling them "get" and "set" works out OK in terms of
specs (a few other method names/specs need to be reworked a bit), and
doesn't seem to be overly abuse-prone. So is probably the best approach.
See updated
http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java
This would not accommodate a suggestion on valhalla list to automatically
equate get/set to relaxed vs volatile versions depending on the declaration
of the field. Which is arguable anyway because it would require tracking down
the declaration site to determine memory semantics.
We'd definitely want to put out a (separate) document showing, for
each C/C++11 construct, how to obtain compatible effects in Java,
and vice versa. The common cases are obvious, but the fact that
Java plain variables are similar to but not exactly the same as
either C++ plain or C++ relaxed leads to a few minor usage differences.
-Doug
From boehm at acm.org Thu Aug 27 17:06:40 2015
From: boehm at acm.org (Hans Boehm)
Date: Thu, 27 Aug 2015 10:06:40 -0700
Subject: [jmm-dev] jdk9 APIs
In-Reply-To: <55DEF4EC.7090608@cs.oswego.edu>
References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu>
<55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu>
<55DAD8D8.8070508@oracle.com>
<55DC6C2B.3030807@cs.oswego.edu>
<55DDAA9E.3080407@cs.oswego.edu> <55DEF4EC.7090608@cs.oswego.edu>
Message-ID:
On Thu, Aug 27, 2015 at 4:30 AM, Doug Lea wrote:
>
>
> This would not accommodate a suggestion on valhalla list to automatically
> equate get/set to relaxed vs volatile versions depending on the
declaration
> of the field. Which is arguable anyway because it would require tracking
down
> the declaration site to determine memory semantics.
Purely from a programming perspective, I don't think this is what you want
anyway. I think one of the more common, and unusually safe, uses of
memory_order_relaxed in C++ is for non-racing accesses to atomic variables,
e.g. the second "check" in double-checked locking, or for reads of atomic
variables that are, in a sense, only hints and not trusted, e.g. the load
usually
preceding a CAS. You really sometimes want plain gets on volatile data.