From dl at cs.oswego.edu Mon Aug 10 18:22:05 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Mon, 10 Aug 2015 14:22:05 -0400 Subject: [jmm-dev] jdk9 APIs Message-ID: <55C8EBCD.7010006@cs.oswego.edu> It's been a while... As everyone has surely noticed, progress on revising the core Java Memory Model stalled after discovering that some new ideas are needed to deal with out-of-thin-air and related issues in both Java and C/C++. Some people are pursuing promising approaches, but there is little chance that a full reformulation will be ready before jdk9 is released (about a year from now). However, despite this, jdk9 will include access and fence APIs that need specs. This is non-negotiable, because the current weird way of accessing the (mostly existing) underlying JVM functionality via sun.misc.Unsafe is going away. (Actually, planned to only partly go away in jdk9, but also introducing classes/methods allowing future decomission.) For accesses (mostly) corresponding to C11 atomics with memory_order, the alternatives reside in "VarHandles". These are more-or-less similar to AtomicXFieldUpdaters, but are designed to support generation of much better code (without dynamic/reflective baggage), generally as good as handcrafting, at least after warmup. The setup for them is unusual -- relying on just-in-time specialization of generic wrappers. This is a scale-down in ambition compared to the original "enhanced volatiles" proposal that would have required language syntax changes that met opposition. Plans are to also include a stand-alone Fences class with all-static methods. (Deja vu for many of us.) Plus to separately add the reachabilityFence/keepAlive method discussed last year as a static method most likely in class java.lang.ref.Reference. There seems to be only a little room for discussion on the exact sets of methods in the VarHandle* and Fences classes: Current versions of VarHandle do NOT include an analog of C11 "consume" mode. As discussed last year, considering its uncertain fate in C/C++, a better tactic might be to support a method of the form "getDependently(ref)" (sorta like in the linux kernel) that covers its {only? main?} use case. But for now omitted. For Fences, adding plain StoreStore and LoadLoad fences to the set acquireFence, ReleaseFence, and fullFence seems wise. The main arguments against LoadLoad and StoreStore in C11 were usability concerns. But they have been found to be useful enough internally to JVMs (mainly on ARM) to be semi-supported within hotspot. And we'd like to not keep making the same mistake of unnecessarily making things ARM/POWER hostile. But the main issue at hand is how we can provide library specs for the APIs without revising the underlying Java Memory Model. My proposal is that we muddle through. Doing so seems surprisingly workable: Moded accesses and fences allow programmers to rule out some behaviors. Specifying them need not spell out underlying rules that hold when these methods are not used. And on the other side, they need not promise any overall property (as in: it might be the case that using a fullFence between every access gives you SC, but we don't need to guarantee it.) It is in a sense cheating to convey this in method specs by using terms that are not fully backed by a formal underlying model. But they still can be made good enough for readers to understand intent, and made more rigorous someday. We can and should also comfort programmers that, when applicable, the primary effects of these methods are compatible with C/C++. Given all this, the specs can be pretty simple. They are done out for Fences below, and if OK should not be hard to apply to VarHandle methods. Other ideas are of course welcome. Pasted below are preliminary versions. If they don't format nicely in your mail reader, get them at http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java /** * A set of methods providing fine-grained control of memory ordering. * *

The Java Language Specification permits operations to be * executed in orders different than are apparent in program source * code, subject to constraints mainly stemming from the use of locks * and volatile fields. The methods of this class can also be used to * impose constraints. Their specifications are phrased in terms of * the lack of "reorderings" -- observable ordering effects that might * otherwise occur if the fence were not present. * * @apiNote More precise phrasing of these specifications may * accompany future updates of the Java Language Specification. */ public class Fences { /** * Ensures that loads and stores before the fence will not be * reordered with loads and stores after the fence. * * @apiNote Ignoring the many semantics differences from C and * C++, this method has memory ordering effects compatible with * atomic_thread_fence(memory_order_seq_cst) */ public static void fullFence() {} /** * Ensures that loads before the fence will not be reordered with * loads and stores after the fence. * * @apiNote Ignoring the many semantics differences from C and * C++, this method has memory ordering effects compatible with * atomic_thread_fence(memory_order_acquire) */ public static void acquireFence() {} /** * Ensures that loads and stores before the fence will not be * reordered with stores after the fence. * * @apiNote Ignoring the many semantics differences from C and * C++, this method has memory ordering effects compatible with * atomic_thread_fence(memory_order_release) */ public static void releaseFence() {} /** * Ensures that loads before the fence will not be reordered with * loads after the fence. */ public static void loadLoadFence() {} /** * Ensures that stores before the fence will not be reordered with * stores after the fence. */ public static void storeStoreFence() {} } class sample VHUsages { int aField; static Varhandle AFIELD = ...; void usage1() { int aFieldValue = AFIELD.getVolatile(this); } } /** * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle */ abstract class NotReallyVarHandle { // Load T getRelaxed(Object owner); T getAcquire(Object owner); T getVolatile(Object owner); // tbd: Consume analog // Store void setRelaxed(Object owner, T val); void setRelease(Object owner, T val); void setVolatile(Object owner, T val); // CAS boolean compareAndSet(Object owner, T cmp, T val); boolean compareAndSetAcquire(Object owner, T cmp, T val); boolean compareAndSetRelease(Object owner, T cmp, T val); boolean weakCompareAndSet(Object owner, T cmp, T val); boolean weakCompareAndSetAcquire(Object owner, T cmp, T val); boolean weakCompareAndSetRelease(Object owner, T cmp, T val); // special RMW T getAndSet(Object owner, T val); T getAndAdd(Object owner, T delta); T addAndGet(Object owner, T delta); } class java.lang.ref.Reference { // add: /** * Ensures that the object referenced by the given reference * remains strongly reachable (as defined in the {@link * java.lang.ref} package documentation), regardless of any prior * actions of the program that might otherwise cause the object to * become unreachable; thus, the referenced object is not * reclaimable by garbage collection at least until after the * invocation of this method. Invocation of this method does not * itself initiate garbage collection or finalization. * * @param ref the reference. If null, this method has no effect. */ public static void reachabilityFence(Object ref) {} } From jeremymanson at google.com Tue Aug 11 18:01:58 2015 From: jeremymanson at google.com (Jeremy Manson) Date: Tue, 11 Aug 2015 11:01:58 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> Message-ID: Just curious: which end users are jonesing for a Fences API? I talk with a LOT of Java users, and it's *never* come up outside of the kinds of people on this list. What's the urgency now? Jeremy On Mon, Aug 10, 2015 at 11:22 AM, Doug Lea

wrote: > It's been a while... > > As everyone has surely noticed, progress on revising the core Java > Memory Model stalled after discovering that some new ideas are needed > to deal with out-of-thin-air and related issues in both Java and > C/C++. Some people are pursuing promising approaches, but there is > little chance that a full reformulation will be ready before jdk9 is > released (about a year from now). > > However, despite this, jdk9 will include access and fence APIs that > need specs. This is non-negotiable, because the current weird way of > accessing the (mostly existing) underlying JVM functionality via > sun.misc.Unsafe is going away. (Actually, planned to only partly go > away in jdk9, but also introducing classes/methods allowing future > decomission.) > > For accesses (mostly) corresponding to C11 atomics with memory_order, > the alternatives reside in "VarHandles". These are more-or-less > similar to AtomicXFieldUpdaters, but are designed to support > generation of much better code (without dynamic/reflective baggage), > generally as good as handcrafting, at least after warmup. The setup > for them is unusual -- relying on just-in-time specialization of > generic wrappers. This is a scale-down in ambition compared to the > original "enhanced volatiles" proposal that would have required > language syntax changes that met opposition. > > Plans are to also include a stand-alone Fences class with all-static > methods. (Deja vu for many of us.) Plus to separately add the > reachabilityFence/keepAlive method discussed last year as a static > method most likely in class java.lang.ref.Reference. > > There seems to be only a little room for discussion on the exact sets > of methods in the VarHandle* and Fences classes: > > Current versions of VarHandle do NOT include an analog of C11 > "consume" mode. As discussed last year, considering its uncertain fate > in C/C++, a better tactic might be to support a method of the form > "getDependently(ref)" (sorta like in the linux kernel) that covers its > {only? main?} use case. But for now omitted. > > For Fences, adding plain StoreStore and LoadLoad fences to the set > acquireFence, ReleaseFence, and fullFence seems wise. The main > arguments against LoadLoad and StoreStore in C11 were usability > concerns. But they have been found to be useful enough internally to > JVMs (mainly on ARM) to be semi-supported within hotspot. And we'd > like to not keep making the same mistake of unnecessarily making > things ARM/POWER hostile. > > But the main issue at hand is how we can provide library specs for the > APIs without revising the underlying Java Memory Model. > > My proposal is that we muddle through. Doing so seems surprisingly > workable: Moded accesses and fences allow programmers to rule out some > behaviors. Specifying them need not spell out underlying rules that > hold when these methods are not used. And on the other side, they need > not promise any overall property (as in: it might be the case that > using a fullFence between every access gives you SC, but we don't need > to guarantee it.) It is in a sense cheating to convey this in method > specs by using terms that are not fully backed by a formal underlying > model. But they still can be made good enough for readers to > understand intent, and made more rigorous someday. We can and should > also comfort programmers that, when applicable, the primary effects > of these methods are compatible with C/C++. Given all this, the > specs can be pretty simple. They are done out for Fences below, and > if OK should not be hard to apply to VarHandle methods. > > Other ideas are of course welcome. > > Pasted below are preliminary versions. If they don't format > nicely in your mail reader, get them at > http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java > > > > /** > * A set of methods providing fine-grained control of memory ordering. > * > *

The Java Language Specification permits operations to be > * executed in orders different than are apparent in program source > * code, subject to constraints mainly stemming from the use of locks > * and volatile fields. The methods of this class can also be used to > * impose constraints. Their specifications are phrased in terms of > * the lack of "reorderings" -- observable ordering effects that might > * otherwise occur if the fence were not present. > * > * @apiNote More precise phrasing of these specifications may > * accompany future updates of the Java Language Specification. > */ > public class Fences { > > /** > * Ensures that loads and stores before the fence will not be > * reordered with loads and stores after the fence. > * > * @apiNote Ignoring the many semantics differences from C and > * C++, this method has memory ordering effects compatible with > * atomic_thread_fence(memory_order_seq_cst) > */ > public static void fullFence() {} > > /** > * Ensures that loads before the fence will not be reordered with > * loads and stores after the fence. > * > * @apiNote Ignoring the many semantics differences from C and > * C++, this method has memory ordering effects compatible with > * atomic_thread_fence(memory_order_acquire) > */ > public static void acquireFence() {} > > /** > * Ensures that loads and stores before the fence will not be > * reordered with stores after the fence. > * > * @apiNote Ignoring the many semantics differences from C and > * C++, this method has memory ordering effects compatible with > * atomic_thread_fence(memory_order_release) > */ > public static void releaseFence() {} > > /** > * Ensures that loads before the fence will not be reordered with > * loads after the fence. > */ > public static void loadLoadFence() {} > > /** > * Ensures that stores before the fence will not be reordered with > * stores after the fence. > */ > public static void storeStoreFence() {} > > > } > > class sample VHUsages { > int aField; > static Varhandle AFIELD = ...; > void usage1() { > int aFieldValue = AFIELD.getVolatile(this); > } > } > > /** > * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle > */ > abstract class NotReallyVarHandle { > // Load > > T getRelaxed(Object owner); > T getAcquire(Object owner); > T getVolatile(Object owner); > // tbd: Consume analog > > // Store > > void setRelaxed(Object owner, T val); > void setRelease(Object owner, T val); > void setVolatile(Object owner, T val); > > // CAS > > boolean compareAndSet(Object owner, T cmp, T val); > boolean compareAndSetAcquire(Object owner, T cmp, T val); > boolean compareAndSetRelease(Object owner, T cmp, T val); > > boolean weakCompareAndSet(Object owner, T cmp, T val); > boolean weakCompareAndSetAcquire(Object owner, T cmp, T val); > boolean weakCompareAndSetRelease(Object owner, T cmp, T val); > > // special RMW > > T getAndSet(Object owner, T val); > T getAndAdd(Object owner, T delta); > T addAndGet(Object owner, T delta); > } > > class java.lang.ref.Reference { > // add: > > /** > * Ensures that the object referenced by the given reference > * remains strongly reachable (as defined in the {@link > * java.lang.ref} package documentation), regardless of any prior > * actions of the program that might otherwise cause the object to > * become unreachable; thus, the referenced object is not > * reclaimable by garbage collection at least until after the > * invocation of this method. Invocation of this method does not > * itself initiate garbage collection or finalization. > * > * @param ref the reference. If null, this method has no effect. > */ > public static void reachabilityFence(Object ref) {} > > } > From dl at cs.oswego.edu Tue Aug 11 19:11:10 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 11 Aug 2015 15:11:10 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: References: <55C8EBCD.7010006@cs.oswego.edu> Message-ID: <55CA48CE.6050405@cs.oswego.edu> On 08/11/2015 02:01 PM, Jeremy Manson wrote: > Just curious: which end users are jonesing for a Fences API? I talk with a LOT > of Java users, and it's *never* come up outside of the kinds of people on this list. > Good timing. I was about to post this set of misc follow-up notes that mentions this. * I don't want to get overly consumed about analogs of consume mode, but it would be nice to deal with it in a way that is good enough not to demand follow-up efforts. A different approach than getDependently is to add an overloaded loadLoadFence method that restricts its scope to loads from the given reference: void loadLoadFence(Object ref). This is similar to the approach used in a previous (scrapped) version of Fence APIs. JVMs could just equate it (as well as the no-arg version) to acquireFence, but with some effort it could be implemented to get better performance on some platforms. At the moment, this seems like a better approach to me than VarHandle getDependently methods. * VarHandles serve two roles: (1) providing access to CAS instructions (plus the special RMW variants getAndAdd etc) that have no bytecodes (2) supporting moded load/store access that in principle is the same as people could arrange with the idiomatic application of Fences, but in practice possibly different because JVMs might have an easier time selecting instructions that combine fencing and access. * The Fence API is geekier than VarHandles but no less necessary as Unsafe goes away. One audience is a small number of expert concurrent programmers writing performance-sensitive code. As a general observation, the more that such people understand their concurrent algorithms, the more flexibility they need to implement them. There are a few constructions inside java.util.concurrent relying on them (for example in StampedLock) that cannot otherwise be obtained in any reasonable way. Which would be a problem without Fence API when Unsafe goes away. Another (mostly future) audience is Tool developers. Efforts to obtain desired consistency properties via tool-based fence placements are starting to approach plausibility. (Including a nice upcoming OOPSLA paper "Declarative fence insertion" by Bender, Lesani, and Palsberg that I don't see anywhere online yet.) We'd rather not see Java ruled out of such efforts just because of the lack of access to functionality that must exist in any Java implementation. (Gratuitous aside: Fences are just flexible Semicolons. Plain semicolons outlived their usefulness, as bemoaned in the fun read: "The Silently Shifting Semicolon" http://snapl.org/2015/abstracts/full/Marino.html http://www.cs.ucla.edu/~todd/research/snapl15.pdf) -Doug From aph at redhat.com Wed Aug 12 08:26:33 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 12 Aug 2015 09:26:33 +0100 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CA48CE.6050405@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> Message-ID: <55CB0339.3020901@redhat.com> On 11/08/15 20:11, Doug Lea wrote: > > * I don't want to get overly consumed about analogs of consume mode, > but it would be nice to deal with it in a way that is good enough not > to demand follow-up efforts. A different approach than getDependently > is to add an overloaded loadLoadFence method that restricts its scope > to loads from the given reference: void loadLoadFence(Object ref). > This is similar to the approach used in a previous (scrapped) version > of Fence APIs. JVMs could just equate it (as well as the no-arg > version) to acquireFence, but with some effort it could be implemented > to get better performance on some platforms. At the moment, this seems > like a better approach to me than VarHandle getDependently methods. I think that's overly optimistic. Given the difficulty of defining consume(ish) mode in a way that can be understood and the time to get HotSpot etc. to implement it correctly and efficiently is there a pressing need to do this now? Andrew. From aleksey.shipilev at oracle.com Wed Aug 12 08:31:18 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 11:31:18 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> Message-ID: <55CB0456.6060405@oracle.com> On 08/10/2015 09:22 PM, Doug Lea wrote: > /** > * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle > */ > abstract class NotReallyVarHandle { > boolean weakCompareAndSet(Object owner, T cmp, T val); > boolean weakCompareAndSetAcquire(Object owner, T cmp, T val); > boolean weakCompareAndSetRelease(Object owner, T cmp, T val); I have a question about these. I understand the acq/rel relaxations for strong CASes, but what is the relaxation for already weak CAS? In other words, weakCompareAndSet is spec'ed in j.u.c.atomic as: "weakCompareAndSet atomically reads and conditionally writes a variable but does not create any happens-before orderings, so provides no guarantees with respect to previous or subsequent reads and writes of any variables other than the target of the weakCompareAndSet." ...and "Additionally weakCompareAndSet does not provide ordering guarantees that are usually needed for synchronization control. However, the method may be useful for updating counters and statistics when such updates are unrelated to the other happens-before orderings of a program. When a thread sees an update to an atomic variable caused by a weakCompareAndSet, it does not necessarily see updates to any other variables that occurred before the weakCompareAndSet." Thanks, -Aleksey From aleksey.shipilev at oracle.com Wed Aug 12 08:33:23 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 11:33:23 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB0456.6060405@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com> Message-ID: <55CB04D3.9000706@oracle.com> On 08/12/2015 11:31 AM, Aleksey Shipilev wrote: > On 08/10/2015 09:22 PM, Doug Lea wrote: >> /** >> * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle >> */ >> abstract class NotReallyVarHandle { >> boolean weakCompareAndSet(Object owner, T cmp, T val); >> boolean weakCompareAndSetAcquire(Object owner, T cmp, T val); >> boolean weakCompareAndSetRelease(Object owner, T cmp, T val); > > I have a question about these. I understand the acq/rel relaxations for > strong CASes, but what is the relaxation for already weak CAS? In other > words, weakCompareAndSet is spec'ed in j.u.c.atomic as: > > "weakCompareAndSet atomically reads and conditionally writes a variable > but does not create any happens-before orderings, so provides no > guarantees with respect to previous or subsequent reads and writes of > any variables other than the target of the weakCompareAndSet." > > ...and > > "Additionally weakCompareAndSet does not provide ordering guarantees > that are usually needed for synchronization control. However, the method > may be useful for updating counters and statistics when such updates are > unrelated to the other happens-before orderings of a program. When a > thread sees an update to an atomic variable caused by a > weakCompareAndSet, it does not necessarily see updates to any other > variables that occurred before the weakCompareAndSet." Or is it the other way around: weakCAS is relaxed by default, and acq/rel weakCAS version are actually *stronger* than default weakCAS? -Aleksey From aph at redhat.com Wed Aug 12 08:35:46 2015 From: aph at redhat.com (Andrew Haley) Date: Wed, 12 Aug 2015 09:35:46 +0100 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB04D3.9000706@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com> <55CB04D3.9000706@oracle.com> Message-ID: <55CB0562.5010209@redhat.com> On 12/08/15 09:33, Aleksey Shipilev wrote: > Or is it the other way around: weakCAS is relaxed by default, and > acq/rel weakCAS version are actually *stronger* than default weakCAS? That's how I read it. weakCompareAndSet is ldxr; cmp; stxr weakCompareAndSetAcquire is ldaxr; cmp; stxr weakCompareAndSetRelease is ldxr; cmp; stlxr Andrew. From aleksey.shipilev at oracle.com Wed Aug 12 08:41:54 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 11:41:54 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB0562.5010209@redhat.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com> <55CB04D3.9000706@oracle.com> <55CB0562.5010209@redhat.com> Message-ID: <55CB06D2.80206@oracle.com> On 08/12/2015 11:35 AM, Andrew Haley wrote: > On 12/08/15 09:33, Aleksey Shipilev wrote: >> Or is it the other way around: weakCAS is relaxed by default, and >> acq/rel weakCAS version are actually *stronger* than default weakCAS? > > That's how I read it. > > weakCompareAndSet is ldxr; cmp; stxr > weakCompareAndSetAcquire is ldaxr; cmp; stxr > weakCompareAndSetRelease is ldxr; cmp; stlxr Okay, that makes sense. Thanks, -Aleksey From aleksey.shipilev at oracle.com Wed Aug 12 09:02:05 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 12:02:05 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> Message-ID: <55CB0B8D.1020509@oracle.com> On 08/10/2015 09:22 PM, Doug Lea wrote: > /** > * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle > */ > abstract class NotReallyVarHandle { > // CAS > > boolean compareAndSet(Object owner, T cmp, T val); > boolean compareAndSetAcquire(Object owner, T cmp, T val); > boolean compareAndSetRelease(Object owner, T cmp, T val); Another, perhaps, "can of worms" question about CAS-es that was lurking in my mind: which -s are we nominally handling? It was probably discussed and answered in C/C++11 standardization efforts? Current HotSpot handles {ref, int, long} CASes. Atomics use either of these, with notable implementation detail for AtomicBoolean that handles *int*, not *boolean* field. Do we seek extending CAS to all primitive typed-fields (with complications for value types when they come)? Do we seek C/C++11 std::atomic behavior that seems to guarantee strong CAS for every primitive specialization? Although I'm oblivious how that is implemented in current C/C++ libraries and compilers. Since VarHandles are the handles over fields, every object field may expect to be pointed at by VarHandle. Therefore, it would be too late to modify the object representation, without penalizing the memory footprint for all objects. In other words, we cannot make boolean field in all objects to take 4 bytes instead of 1 byte for a singular exotic VH.cas() usage. Therefore, the question seems to be the hardware ability to make subword CASes. At least x86 seems to be able to make the aligned sub-word CASes, although I have not checked the performance for them -- but my gut feeling is that it does not differ much, since CASes are cacheline-based on most modern x86-s. I mused a bit how to go forward in the absence of subword CASes. E.g. make an aligned full-word CAS that contains the field, but that seems to break the progress guarantees: two "strong" CASes on adjacent boolean fields are not "strong", and one may fail spuriously. Or, e.g. locking the object (words) for atomic operations, similarly to (late) fallback strategy in AtomicLong for 32-bit systems, which is also not fun implementation-wise. Thanks, -Aleksey From dl at cs.oswego.edu Wed Aug 12 10:43:07 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 12 Aug 2015 06:43:07 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB0562.5010209@redhat.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com> <55CB04D3.9000706@oracle.com> <55CB0562.5010209@redhat.com> Message-ID: <55CB233B.40707@cs.oswego.edu> On 08/12/2015 04:35 AM, Andrew Haley wrote: > On 12/08/15 09:33, Aleksey Shipilev wrote: >> Or is it the other way around: weakCAS is relaxed by default, and >> acq/rel weakCAS version are actually *stronger* than default weakCAS? > > That's how I read it. > > weakCompareAndSet is ldxr; cmp; stxr > weakCompareAndSetAcquire is ldaxr; cmp; stxr > weakCompareAndSetRelease is ldxr; cmp; stlxr > Yes. Exactly so. I'll make a pass on the specs to clarify. We defined weakCAS back in the original JDK5 atomics, but just linked the Java code to plain Unsafe CAS, with the expectation that someday someone would add intrinsics allowing a these kinds of implementations. More than a decade later, the intrinsics still don't exist. So it is nice to see you and Aleksey contemplating this (also for the acq/rel versions). -Doug From dl at cs.oswego.edu Wed Aug 12 12:55:42 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 12 Aug 2015 08:55:42 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB0339.3020901@redhat.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> <55CB0339.3020901@redhat.com> Message-ID: <55CB424E.6000405@cs.oswego.edu> On 08/12/2015 04:26 AM, Andrew Haley wrote: > On 11/08/15 20:11, Doug Lea wrote: >> >> * I don't want to get overly consumed about analogs of consume mode, >> but it would be nice to deal with it in a way that is good enough not >> to demand follow-up efforts. > > I think that's overly optimistic. Given the difficulty of defining > consume(ish) mode in a way that can be understood and the time to get > HotSpot etc. to implement it correctly and efficiently is there a > pressing need to do this now? > I'm not sure. Do we add loadLoadFence(Object ref) and get it over with, or postpone? In light of the tendency for unresolved issues to languish for many years, I'd like to at least carefully consider the options. Background: The idea is basically the same as already exists for final fields: Even on processors otherwise requiring load fences, they are not in general used when reading a final field given a reference to an object containing one. The underlying notion is that the processor/thread cannot possibly speculate/reuse the field value given that it has never seen that reference before, as assured by allocation and GC protocols, along with store/release fences in constructors. (Aside: this reasoning is not airtight until OOTA-related issues are solved.) And further, once read, the value can be used forever without re-reading. Well, except that actual JVMs are more conservative about this, trying (imperfectly) to cope with some cases of modifying final fields. In RCU (the main use case driving consume mode), pretty much the same holds except that the compiler is not aware of the constraints, so doesn't know that it can elide the load fence, and also doesn't know that it must discard any previous values. So you have to tell it using a special load or fence operation. Most of the time, just using relaxed-loads would work fine. But "most of the time" is not good enough. I'm sure that there are plenty of cases out there where Java programs also do this and get away with it because the constraints happen to hold (even though not officially DRF). And also buggy cases where they only sometimes hold, but would be "fixed" via the use of a dependent-load fence/method. Using an unscoped loadLoadFence or acquireFence would also fix, but at greater cost on some processors. And 99+% of programmers would accept the (possibly) further cost of using volatile here to make their programs race-free by the usual criteria. Given this background, I'm still not sure of the best near-term choice: Only a few extremely performance sensitive users on only some platforms would be impacted if loadLoadFence(Object ref) were defined but by default bound to acquireFence (Unsafe loadFence). And conversely, only a few users would be impacted if loadLoadFence(Object ref) did not even exist until some time after RCU-like constructions became more widespread. I'm still thinking that the get-it-over-with option is slightly better. -Doug From dl at cs.oswego.edu Wed Aug 12 14:38:10 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 12 Aug 2015 10:38:10 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB0B8D.1020509@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com> Message-ID: <55CB5A52.7030100@cs.oswego.edu> On 08/12/2015 05:02 AM, Aleksey Shipilev wrote: > On 08/10/2015 09:22 PM, Doug Lea wrote: >> boolean compareAndSet(Object owner, T cmp, T val); >> boolean compareAndSetAcquire(Object owner, T cmp, T val); >> boolean compareAndSetRelease(Object owner, T cmp, T val); > > Another, perhaps, "can of worms" question about CAS-es that was lurking > in my mind: which -s are we nominally handling? The answers need to be a little different for Java and C++. In C++, you can only apply to "atomic" types. Under the current VarHandle scheme, you can attach to anything at compile time, but methods may bail out throwing an UnsupportedOperationException at run-time. This will always occur for example when applying getAndAdd to a VarHandle of any Object type. This was apparently the tradeoff most acceptable to all concerned. It means that in addition to memory-model-related specs, the methods will need to carry warnings that they might just throw UOE; hopefully enumerating common cases like CAS for sub-word types and unaligned memory. > > Therefore, the question seems to be the hardware ability to make subword > CASes. My understanding (and preference) is that trying to CAS with VarHandles to bytes and other small types just throws UOE and should be spec'ed to do so. -Doug From aleksey.shipilev at oracle.com Wed Aug 12 16:20:44 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Wed, 12 Aug 2015 19:20:44 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB233B.40707@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0456.6060405@oracle.com> <55CB04D3.9000706@oracle.com> <55CB0562.5010209@redhat.com> <55CB233B.40707@cs.oswego.edu> Message-ID: <55CB725C.3010003@oracle.com> On 08/12/2015 01:43 PM, Doug Lea wrote: > On 08/12/2015 04:35 AM, Andrew Haley wrote: >> On 12/08/15 09:33, Aleksey Shipilev wrote: >>> Or is it the other way around: weakCAS is relaxed by default, and >>> acq/rel weakCAS version are actually *stronger* than default weakCAS? >> >> That's how I read it. >> >> weakCompareAndSet is ldxr; cmp; stxr >> weakCompareAndSetAcquire is ldaxr; cmp; stxr >> weakCompareAndSetRelease is ldxr; cmp; stlxr >> > > Yes. Exactly so. I'll make a pass on the specs to clarify. > > We defined weakCAS back in the original JDK5 atomics, but > just linked the Java code to plain Unsafe CAS, with the > expectation that someday someone would add intrinsics > allowing a these kinds of implementations. More than a > decade later, the intrinsics still don't exist. So it is > nice to see you and Aleksey contemplating this (also for > the acq/rel versions). Yes, we are drilling more holes in Unsafe/VarHandles to support these usages: https://bugs.openjdk.java.net/browse/JDK-8133451 Andrew, I sent a separate note with a patch to review. Thanks, -Aleksey From boehm at acm.org Wed Aug 12 22:33:13 2015 From: boehm at acm.org (Hans Boehm) Date: Wed, 12 Aug 2015 15:33:13 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] Message-ID: On Mon, Aug 10, 2015 at 11:22 AM, Doug Lea

wrote: > > ... > > For Fences, adding plain StoreStore and LoadLoad fences to the set > acquireFence, ReleaseFence, and fullFence seems wise. The main > arguments against LoadLoad and StoreStore in C11 were usability > concerns. But they have been found to be useful enough internally to > JVMs (mainly on ARM) to be semi-supported within hotspot. And we'd > like to not keep making the same mistake of unnecessarily making > things ARM/POWER hostile. Let me argue once more against LoadLoad, and at least dampen the enthusiasm for StoreStore. I know of no hardware instructions, except on SPARC, that correspond to a LoadLoad fence. And my impression is that it's not very useful on SPARC. The ARM DMB xLD fence instruction, if I understand correctly, is essentially a C++ acquire fence. POWER lwsync is even stronger than that. For TSO architectures, loadLoadFence and acquireFence are both no-ops. I believe that in all mainstream environments, loadloadFence will generate the same code as acquireFence, but is much harder to specify and much trickier to use correctly. Why would you bother? The StoreStore situation is slightly different for two reasons: 1) It does seem to correspond to the ARM DMB xST instruction. 2) It does seem to be useful in VM implementations, in that it seems to suffice as the end-of-constructor fence. However, I think it difficult to specify correctly outside of that specific essentially final-field-initialization scenario. If I write x++; // Increment zero initialized field storeStoreFence(); x_init = true; can the load for the x++ see a value that was assigned to x in another thread after seeing x_init set to true? The load can be reordered with the fence. The store can't, but the store "depends on" the load. But we don't enforce dependency-based ordering, mostly because we don't know how to define it. Even if we could define it reasonably, the semantics strike me as thoroughly weird, even by my warped standards: Consider: Thread 1: x = ... ? 1 : 2; assert (x < 3); storeStoreFence(); x_init = true; Thread 2: if (x_init) { acquireFence(); x = 17; } allows the assertion to fail. In a more complex example, any method calls used to initialize x in thread 1 need to be careful not to read the value they just wrote. You guys did realize that you just volunteered to make a pass through the library specification to specify which calls are safe in such cases, right? It may be more promising to somehow specify it only for use with write-once variables. But I don't really know how to make that work either. And then there are important transitivity/cumulativity questions ... I think LoadLoad actually has some analogous issues as well. But unlike StoreStore, there seems to be no hardware/performance argument to even consider it. Hans > > > /** > * A set of methods providing fine-grained control of memory ordering. > * > *

The Java Language Specification permits operations to be > * executed in orders different than are apparent in program source > * code, subject to constraints mainly stemming from the use of locks > * and volatile fields. The methods of this class can also be used to > * impose constraints. Their specifications are phrased in terms of > * the lack of "reorderings" -- observable ordering effects that might > * otherwise occur if the fence were not present. > * > * @apiNote More precise phrasing of these specifications may > * accompany future updates of the Java Language Specification. > */ > public class Fences { > > /** > * Ensures that loads and stores before the fence will not be > * reordered with loads and stores after the fence. > * > * @apiNote Ignoring the many semantics differences from C and > * C++, this method has memory ordering effects compatible with > * atomic_thread_fence(memory_order_seq_cst) > */ > public static void fullFence() {} > > /** > * Ensures that loads before the fence will not be reordered with > * loads and stores after the fence. > * > * @apiNote Ignoring the many semantics differences from C and > * C++, this method has memory ordering effects compatible with > * atomic_thread_fence(memory_order_acquire) > */ > public static void acquireFence() {} > > /** > * Ensures that loads and stores before the fence will not be > * reordered with stores after the fence. > * > * @apiNote Ignoring the many semantics differences from C and > * C++, this method has memory ordering effects compatible with > * atomic_thread_fence(memory_order_release) > */ > public static void releaseFence() {} > > /** > * Ensures that loads before the fence will not be reordered with > * loads after the fence. > */ > public static void loadLoadFence() {} > > /** > * Ensures that stores before the fence will not be reordered with > * stores after the fence. > */ > public static void storeStoreFence() {} > > > } > From dl at cs.oswego.edu Thu Aug 13 12:19:17 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 13 Aug 2015 08:19:17 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: Message-ID: <55CC8B45.50909@cs.oswego.edu> On 08/12/2015 06:33 PM, Hans Boehm wrote: > > Let me argue once more against LoadLoad, and at least dampen the > enthusiasm for StoreStore. Thanks for the critiques! (Even though I remain unconvinced.) I should have noted that ARM mappings are only part of the motivation for loadLoadFence and storeStoreFence. Another is protection against loop "optimizations" that are highly non-optimal. This is not strictly a compiler issue, but easier to illustrate as one. Suppose for example you have a method that writes several variables, along with reader methods that can handle all ordering races among the writes. But you still want to ensure that the variables are actually written if the method is called in a loop. A trailing storeStoreFence() seems to be the cheapest and conceptually most appropriate way to reduce communication latency. (In other words, it is "correct" but undesirable for method c() here to only use the final (x, y) values.) Symmetrical arguments apply to using leading loadLoadFences on the complementary reader methods (that is otherwise similar to RCU-like constructions). class C { int x = 0, y = 0; // relaxed void p() { // called in producer thread for (int i = 0; i < 1000000; ++i) writes(heavyPureComputation(i)); } void c() { // called in consumer thread for (;;) { if (occasionally) reads(); // ... } void writes(int k) { x = k; y = k + 17; storeStoreFence(); // please actually store x and y if in a loop } void reads() { loadLoadFence(); // please actually load x and y if in a loop if (y == x + 17) something(); } } This is not a hypothetical example. It's abstracted from cases I've encountered. Like the RCU-like examples mentioned yesterday, these effects arise only when you are writing racy performance-critical code. But that's what low-level concurrent algorithm and data structure designers do! Back to .. > > I know of no hardware instructions, except on SPARC, that correspond > to a LoadLoad fence. And my impression is that it's not very useful on > SPARC. The ARM DMB xLD fence instruction, if I understand correctly, > is essentially a C++ acquire fence. But I think that pseudo-fences (load; compare to self; ...) need not be? > > However, I think it difficult to specify correctly outside of that specific > essentially final-field-initialization scenario. It doesn't seem hard at all to specify in isolation. The interactions with base ordering rules can be non-obvious though. (Especially since, in the absence of a revised base model, those rules might as well say that anything goes.) So, like any fence method, it should be used when nothing simpler applies. And surely not in: > > x++; // Increment zero initialized field > storeStoreFence(); > x_init = true; -Doug From boehm at acm.org Thu Aug 13 21:04:48 2015 From: boehm at acm.org (Hans Boehm) Date: Thu, 13 Aug 2015 14:04:48 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: <55CC8B45.50909@cs.oswego.edu> References: <55CC8B45.50909@cs.oswego.edu> Message-ID: On Thu, Aug 13, 2015 at 5:19 AM, Doug Lea

wrote: > > On 08/12/2015 06:33 PM, Hans Boehm wrote: >> >> >> Let me argue once more against LoadLoad, and at least dampen the >> enthusiasm for StoreStore. > > > Thanks for the critiques! (Even though I remain unconvinced.) > > I should have noted that ARM mappings are only part of the motivation > for loadLoadFence and storeStoreFence. Another is protection against > loop "optimizations" that are highly non-optimal. This is not > strictly a compiler issue, but easier to illustrate as one. Suppose > for example you have a method that writes several variables, along > with reader methods that can handle all ordering races among the > writes. But you still want to ensure that the variables are actually > written if the method is called in a loop. A trailing > storeStoreFence() seems to be the cheapest and conceptually most > appropriate way to reduce communication latency. (In other words, it > is "correct" but undesirable for method c() here to only use the > final (x, y) values.) Symmetrical arguments apply to using > leading loadLoadFences on the complementary reader methods > (that is otherwise similar to RCU-like constructions). > > class C { > int x = 0, y = 0; // relaxed > > void p() { // called in producer thread > for (int i = 0; i < 1000000; ++i) > writes(heavyPureComputation(i)); > } > > void c() { // called in consumer thread > for (;;) { > if (occasionally) > reads(); > // ... > } > > void writes(int k) { > x = k; > y = k + 17; > storeStoreFence(); // please actually store x and y if in a loop > } > > void reads() { > loadLoadFence(); // please actually load x and y if in a loop > if (y == x + 17) > something(); > } > } > > This is not a hypothetical example. It's abstracted from cases I've > encountered. Like the RCU-like examples mentioned yesterday, these effects > arise only when you are writing racy performance-critical code. But > that's what low-level concurrent algorithm and data structure > designers do! No disagreement about the existence of this problem. There was a recent long discussion of this on a C++ mailing list. There is not yet agreement there about the correct solution. But we didn't have any advocates for this approach. I think this is fundamentally a completely different problem that has nothing to do with restricting order to either only loads or only stores. You are trying to instead dissuade the compiler from drastic code movement in certain cases. I don't think a fence-based approach works. Deferring all the stores to the end of the loop fundamentally remains correct, even with the StoreStore fence, since it's consistent with the producer just running very fast for a while. The constraint you're trying to enforce has nothing to do with ordering. Aside from not working correctly, you end up slowing down ARM code in ways that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst" fences everywhere. (How expensive they are varies. On a number of implementations they basically seem to be full fences.) My personal favorite solution to this problem is to add an annotation for fields that are used as relaxed atomics, and to agree that high quality compilers should basically leave those alone. Optimizing those using conventional rules for sequential performance may lead to disastrous performance for the whole multithreaded system. If you don't understand it, leave it alone. Peter Dimov pointed out that there are cases, e.g. consecutive C++ reference count updates, where you probably do want the compiler to aggressively optimize in spite of concurrent access. You may need a second annotation for those. > > Back to .. > >> >> I know of no hardware instructions, except on SPARC, that correspond >> to a LoadLoad fence. And my impression is that it's not very useful on >> SPARC. The ARM DMB xLD fence instruction, if I understand correctly, >> is essentially a C++ acquire fence. > > > But I think that pseudo-fences (load; compare to self; ...) need not be? Those are fundamentally LoadStore fences. On Power you can also turn them into a LoadLoad fence by adding an isync. I think the ARM situation is essentially identical. > >> >> However, I think it difficult to specify correctly outside of that specific >> essentially final-field-initialization scenario. > > > It doesn't seem hard at all to specify in isolation. > The interactions with base ordering rules can be non-obvious though. > (Especially since, in the absence of a revised base model, > those rules might as well say that anything goes.) > So, like any fence method, it should be used when nothing > simpler applies. And surely not in: > > >> >> x++; // Increment zero initialized field >> storeStoreFence(); >> x_init = true; My problem is that this looks a lot like a constructor fence, or maybe the writer side of a seqlock, which are the only use cases I know of for StoreStore fences. And the harder I think about constructor fences, the more nervous I get about using StoreStore fences there without fully understanding the transitivity issues. Hans > > > -Doug > From dl at cs.oswego.edu Thu Aug 13 23:19:37 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 13 Aug 2015 19:19:37 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> Message-ID: <55CD2609.3070601@cs.oswego.edu> On 08/13/2015 05:04 PM, Hans Boehm wrote: > I don't think a fence-based approach works. Deferring all the stores to the > end of the loop fundamentally remains correct, even with the StoreStore > fence, since it's consistent with the producer just running very fast for a > while. The constraint you're trying to enforce has nothing to do with > ordering. I must be missing something fundamental about C++ specs. Are C++ compilers allowed to ignore release fences in between writes to the same variables? In unrolled form, that's what this would amount to here. > > Aside from not working correctly, you end up slowing down ARM code in ways > that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst" > fences everywhere. (How expensive they are varies. On a number of > implementations they basically seem to be full fences.) Right. It does put the programmer in control though; for example if ((i % 100) == 99) storeStoreFence() Considering that the goal is communication latency reduction at the expense of throughput, only the programmer would be able to make these tradeoffs. -Doug From boehm at acm.org Thu Aug 13 23:56:16 2015 From: boehm at acm.org (Hans Boehm) Date: Thu, 13 Aug 2015 16:56:16 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: <55CD2609.3070601@cs.oswego.edu> References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> Message-ID: On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea
wrote: > > On 08/13/2015 05:04 PM, Hans Boehm wrote: > >> I don't think a fence-based approach works. Deferring all the stores to the >> end of the loop fundamentally remains correct, even with the StoreStore >> fence, since it's consistent with the producer just running very fast for a >> while. The constraint you're trying to enforce has nothing to do with >> ordering. > > > I must be missing something fundamental about C++ specs. Are C++ > compilers allowed to ignore release fences in between writes > to the same variables? In unrolled form, that's what this would > amount to here. I think that's unavoidable. If I write for (...) { x = something_expensive(); fence; } it's very hard to prevent the implementation from implementing that as And that looks exactly like merging all the stores into one. I think you can't really disallow the latter without disallowing a time-slicing uniprocessor scheduler. They're behaviorally identical. That doesn't prevent us from providing the compiler with advice to discourage that. But I think this doesn't have anything to do with fences. > >> >> Aside from not working correctly, you end up slowing down ARM code in ways >> that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst" >> fences everywhere. (How expensive they are varies. On a number of >> implementations they basically seem to be full fences.) > > > Right. It does put the programmer in control though; for example > if ((i % 100) == 99) storeStoreFence() Agreed. But it seems to me that this is control over an unnecessary trade-off. Clearly the ideal code involves no fences. And in most cases, just having the programmer specifying where the stores to shared variables should go, and having the compiler leave that alone, seems like a better and simpler way to control this. Hans > > Considering that the goal is communication latency reduction at > the expense of throughput, only the programmer would be able > to make these tradeoffs. > > -Doug > > From dl at cs.oswego.edu Fri Aug 14 11:50:53 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Aug 2015 07:50:53 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> Message-ID: <55CDD61D.80602@cs.oswego.edu> On 08/13/2015 07:56 PM, Hans Boehm wrote: > On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea
> > > I must be missing something fundamental about C++ specs. Are C++ > > compilers allowed to ignore release fences in between writes > > to the same variables? In unrolled form, that's what this would > > amount to here. > > I think that's unavoidable. If I write > > for (...) { > x = something_expensive(); > fence; > } > > it's very hard to prevent the implementation from implementing that as > > > > > And that looks exactly like merging all the stores into one. In other words, it is legal (at least for some loops in which you can prove termination etc) to postpone ALL the stores to end of loop (assuming the CPU has enough registers to hold them all or is willing to use unbounded local stack space), in which case even if they are performed in order, you may lose responsiveness. So there are limitations in the ability of ordering control to improve responsiveness. Which is unsurprising given all the other limitations under weak scheduling guarantees. But that's not much of an argument for not even allowing it. > But I think this doesn't have anything to do with fences. Ordering constraints seem intrinsic to the problem at hand. It's the complement of the main issue in RCU/consume: "really read this" vs "really write this". And like RCU, the construction can be seen as one in which a special-case optimization removing a processor-level fence may apply in conditions that are hard to express/control. You can in the mean time cheat in both cases in C/C++ by casting to (C) volatile. But nothing like this applies in Java. -Doug From paulmck at linux.vnet.ibm.com Fri Aug 14 12:39:04 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 05:39:04 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> Message-ID: <20150814123903.GG3895@linux.vnet.ibm.com> On Thu, Aug 13, 2015 at 04:56:16PM -0700, Hans Boehm wrote: > On Thu, Aug 13, 2015 at 4:19 PM, Doug Lea
wrote: > > > > On 08/13/2015 05:04 PM, Hans Boehm wrote: > > > >> I don't think a fence-based approach works. Deferring all the stores to > the > >> end of the loop fundamentally remains correct, even with the StoreStore > >> fence, since it's consistent with the producer just running very fast > for a > >> while. The constraint you're trying to enforce has nothing to do with > >> ordering. > > > > > > I must be missing something fundamental about C++ specs. Are C++ > > compilers allowed to ignore release fences in between writes > > to the same variables? In unrolled form, that's what this would > > amount to here. > > I think that's unavoidable. If I write > > for (...) { > x = something_expensive(); > fence; > } > > it's very hard to prevent the implementation from implementing that as > > > As in place an unrolled version of the loop into a hardware sort-of transaction, like the ones available on recent x86, Power, and mainframe systems? If that is what you are getting at, this scenario sounds like an excellent argument for having some way to tell the compiler not to make that transformation. Perhaps the fence above should be interpreted as such saying to bound any transactions. > And that looks exactly like merging all the stores into one. It would, but don't expect any real-time programmers to thank you for doing that to their code. ;-) > I think you can't really disallow the latter without disallowing a > time-slicing > uniprocessor scheduler. They're behaviorally identical. If you are excluding performance and response-time characteristics from "behaviorally", I can't argue. Again, don't expect any thanks from real-time programmers. > That doesn't prevent us from providing the compiler with advice to > discourage > that. But I think this doesn't have anything to do with fences. Volatile fences, then? ;-) > >> Aside from not working correctly, you end up slowing down ARM code in > ways > >> that are entirely unnecessary, by inserting "dmb ishld" or "dmb ishst" > >> fences everywhere. (How expensive they are varies. On a number of > >> implementations they basically seem to be full fences.) > > > > > > Right. It does put the programmer in control though; for example > > if ((i % 100) == 99) storeStoreFence() > > Agreed. But it seems to me that this is control over an unnecessary > trade-off. Clearly the ideal code involves no fences. And in most cases, > just having the programmer specifying where the stores to shared variables > should go, and having the compiler leave that alone, seems like a better > and simpler way to control this. In all cases? Thanx, Paul > Hans > > > > > Considering that the goal is communication latency reduction at > > the expense of throughput, only the programmer would be able > > to make these tradeoffs. > > > > -Doug > > > > > From dl at cs.oswego.edu Fri Aug 14 13:35:40 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Aug 2015 09:35:40 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> Message-ID: <55CDEEAC.2000300@cs.oswego.edu> On 08/13/2015 05:04 PM, Hans Boehm wrote: > >> I know of no hardware instructions, except on SPARC, that correspond > >> to a LoadLoad fence. And my impression is that it's not very useful on > >> SPARC. The ARM DMB xLD fence instruction, if I understand correctly, > >> is essentially a C++ acquire fence. > > > > > > But I think that pseudo-fences (load; compare to self; ...) need not be? > Those are fundamentally LoadStore fences. On Power you can also turn > them into a LoadLoad fence by adding an isync. I think the ARM situation > is essentially identical. On a little more checking, I think so too. (Formal validation would be nice...) But even if so, it seems better to have a uniform API: writes: full > release > storeStore reads: full > acquire > loadLoad even if loadLoad is internally mapped to acquire. And omitting it feels even more wrong if we support RCU-like usages with scoped loadLoadFence(Object ref). -Doug From paulmck at linux.vnet.ibm.com Fri Aug 14 15:39:03 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 08:39:03 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CB5A52.7030100@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com> <55CB5A52.7030100@cs.oswego.edu> Message-ID: <20150814153903.GJ3895@linux.vnet.ibm.com> On Wed, Aug 12, 2015 at 10:38:10AM -0400, Doug Lea wrote: > On 08/12/2015 05:02 AM, Aleksey Shipilev wrote: > >On 08/10/2015 09:22 PM, Doug Lea wrote: > > >> boolean compareAndSet(Object owner, T cmp, T val); > >> boolean compareAndSetAcquire(Object owner, T cmp, T val); > >> boolean compareAndSetRelease(Object owner, T cmp, T val); > > > >Another, perhaps, "can of worms" question about CAS-es that was lurking > >in my mind: which -s are we nominally handling? > > The answers need to be a little different for Java and C++. > In C++, you can only apply to "atomic" types. > > Under the current VarHandle scheme, you can attach to > anything at compile time, but methods may bail out > throwing an UnsupportedOperationException at run-time. > > This will always occur for example when applying getAndAdd > to a VarHandle of any Object type. > > This was apparently the tradeoff most acceptable to all concerned. > It means that in addition to memory-model-related specs, > the methods will need to carry warnings that they might > just throw UOE; hopefully enumerating common cases > like CAS for sub-word types and unaligned memory. > > >Therefore, the question seems to be the hardware ability to make subword > >CASes. > > My understanding (and preference) is that trying to CAS with > VarHandles to bytes and other small types just throws > UOE and should be spec'ed to do so. OK, I will bite... Why can't you just apply a CAS instruction to the larger word containing the subword? Thanx, Paul From aleksey.shipilev at oracle.com Fri Aug 14 15:44:27 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 14 Aug 2015 18:44:27 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <20150814153903.GJ3895@linux.vnet.ibm.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com> <55CB5A52.7030100@cs.oswego.edu> <20150814153903.GJ3895@linux.vnet.ibm.com> Message-ID: <55CE0CDB.1070005@oracle.com> On 14.08.2015 18:39, Paul E. McKenney wrote: > On Wed, Aug 12, 2015 at 10:38:10AM -0400, Doug Lea wrote: >> On 08/12/2015 05:02 AM, Aleksey Shipilev wrote: >>> On 08/10/2015 09:22 PM, Doug Lea wrote: >> >>>> boolean compareAndSet(Object owner, T cmp, T val); >>>> boolean compareAndSetAcquire(Object owner, T cmp, T val); >>>> boolean compareAndSetRelease(Object owner, T cmp, T val); >>> >>> Another, perhaps, "can of worms" question about CAS-es that was lurking >>> in my mind: which -s are we nominally handling? >> >> The answers need to be a little different for Java and C++. >> In C++, you can only apply to "atomic" types. >> >> Under the current VarHandle scheme, you can attach to >> anything at compile time, but methods may bail out >> throwing an UnsupportedOperationException at run-time. >> >> This will always occur for example when applying getAndAdd >> to a VarHandle of any Object type. >> >> This was apparently the tradeoff most acceptable to all concerned. >> It means that in addition to memory-model-related specs, >> the methods will need to carry warnings that they might >> just throw UOE; hopefully enumerating common cases >> like CAS for sub-word types and unaligned memory. >> >>> Therefore, the question seems to be the hardware ability to make subword >>> CASes. >> >> My understanding (and preference) is that trying to CAS with >> VarHandles to bytes and other small types just throws >> UOE and should be spec'ed to do so. > > OK, I will bite... Why can't you just apply a CAS instruction to the > larger word containing the subword? As I tried to explain before, this seems to break the guarantees for strong CAS, that at least one of the CASes against the same location should succeed. The trick here is that on Java level, "location" is defined as "field" or something like that. Case in point: imagine you have two subword-sized fields in the same word, and you do CAS on each of them. From Java level, these CASes are uncontended, and both should succeed. In our hypothetical implementation, one of the CASes would fail. This is a good thing for weakCAS that is spec-ed to allow spurious failures, but not for a strong one. Thanks, -Aleksey From paulmck at linux.vnet.ibm.com Fri Aug 14 15:36:02 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 08:36:02 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CA48CE.6050405@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> Message-ID: <20150814153602.GI3895@linux.vnet.ibm.com> On Tue, Aug 11, 2015 at 03:11:10PM -0400, Doug Lea wrote: > On 08/11/2015 02:01 PM, Jeremy Manson wrote: > >Just curious: which end users are jonesing for a Fences API? I talk with a LOT > >of Java users, and it's *never* come up outside of the kinds of people on this list. > > > > Good timing. I was about to post this set of misc follow-up notes > that mentions this. > > * I don't want to get overly consumed about analogs of consume mode, > but it would be nice to deal with it in a way that is good enough not > to demand follow-up efforts. A different approach than getDependently > is to add an overloaded loadLoadFence method that restricts its scope > to loads from the given reference: void loadLoadFence(Object ref). > This is similar to the approach used in a previous (scrapped) version > of Fence APIs. JVMs could just equate it (as well as the no-arg > version) to acquireFence, but with some effort it could be implemented > to get better performance on some platforms. At the moment, this seems > like a better approach to me than VarHandle getDependently methods. Well, if you change your mind about being consumed by analogs of consume, please see the attached revision of C++ working draft N4321. ;-) The most recent addition is what can be thought of as a defacto characterization of current compiler behavior in conjunction with actual Linux-kernel usage. This addition is in Section 7.9 starting on page 28. Prior to Section 7.9, there are a number of other proposals involving adding additional marking to the dependency chains. Section 7.10 starting on page 34 gives a draft evaluation of the various proposals, see especially Table 1 on page 35. Within the Linux kernel, dependency-chain heads are marked with one of the variants of rcu_dereference(), which has acquire-like semantics, so that rcu_dereference()'s load of the pointer is ordered against both loads and stores that are part of the dependency chain headed by that rcu_dereference(). For a fanciful example: p = rcu_dereference(gp); if (!p->visited) /* Don't unnecessarily dirty the cache. */ p->visited = 1; Here the assignment to p->visited is ordered after the load from gp. > * VarHandles serve two roles: (1) providing access to CAS instructions > (plus the special RMW variants getAndAdd etc) that have no bytecodes > (2) supporting moded load/store access that in principle is the same > as people could arrange with the idiomatic application of Fences, but > in practice possibly different because JVMs might have an easier time > selecting instructions that combine fencing and access. > > * The Fence API is geekier than VarHandles but no less necessary as > Unsafe goes away. One audience is a small number of expert concurrent > programmers writing performance-sensitive code. As a general > observation, the more that such people understand their concurrent > algorithms, the more flexibility they need to implement them. There > are a few constructions inside java.util.concurrent relying on them > (for example in StampedLock) that cannot otherwise be obtained in any > reasonable way. Which would be a problem without Fence API when > Unsafe goes away. > > Another (mostly future) audience is Tool developers. Efforts to obtain > desired consistency properties via tool-based fence placements are > starting to approach plausibility. (Including a nice upcoming OOPSLA > paper "Declarative fence insertion" by Bender, Lesani, and Palsberg > that I don't see anywhere online yet.) We'd rather not see Java ruled > out of such efforts just because of the lack of access to > functionality that must exist in any Java implementation. > > (Gratuitous aside: Fences are just flexible Semicolons. Plain > semicolons outlived their usefulness, as bemoaned in the fun read: > "The Silently Shifting Semicolon" > http://snapl.org/2015/abstracts/full/Marino.html > http://www.cs.ucla.edu/~todd/research/snapl15.pdf) I do like this view of the matter! Which probably means that everyone else hates it. ;-) Thanx, Paul From paulmck at linux.vnet.ibm.com Fri Aug 14 16:03:50 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 09:03:50 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CE0CDB.1070005@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CB0B8D.1020509@oracle.com> <55CB5A52.7030100@cs.oswego.edu> <20150814153903.GJ3895@linux.vnet.ibm.com> <55CE0CDB.1070005@oracle.com> Message-ID: <20150814160350.GM3895@linux.vnet.ibm.com> On Fri, Aug 14, 2015 at 06:44:27PM +0300, Aleksey Shipilev wrote: > On 14.08.2015 18:39, Paul E. McKenney wrote: > > On Wed, Aug 12, 2015 at 10:38:10AM -0400, Doug Lea wrote: > >> On 08/12/2015 05:02 AM, Aleksey Shipilev wrote: > >>> On 08/10/2015 09:22 PM, Doug Lea wrote: > >> > >>>> boolean compareAndSet(Object owner, T cmp, T val); > >>>> boolean compareAndSetAcquire(Object owner, T cmp, T val); > >>>> boolean compareAndSetRelease(Object owner, T cmp, T val); > >>> > >>> Another, perhaps, "can of worms" question about CAS-es that was lurking > >>> in my mind: which -s are we nominally handling? > >> > >> The answers need to be a little different for Java and C++. > >> In C++, you can only apply to "atomic" types. > >> > >> Under the current VarHandle scheme, you can attach to > >> anything at compile time, but methods may bail out > >> throwing an UnsupportedOperationException at run-time. > >> > >> This will always occur for example when applying getAndAdd > >> to a VarHandle of any Object type. > >> > >> This was apparently the tradeoff most acceptable to all concerned. > >> It means that in addition to memory-model-related specs, > >> the methods will need to carry warnings that they might > >> just throw UOE; hopefully enumerating common cases > >> like CAS for sub-word types and unaligned memory. > >> > >>> Therefore, the question seems to be the hardware ability to make subword > >>> CASes. > >> > >> My understanding (and preference) is that trying to CAS with > >> VarHandles to bytes and other small types just throws > >> UOE and should be spec'ed to do so. > > > > OK, I will bite... Why can't you just apply a CAS instruction to the > > larger word containing the subword? > > As I tried to explain before, this seems to break the guarantees for > strong CAS, that at least one of the CASes against the same location > should succeed. The trick here is that on Java level, "location" is > defined as "field" or something like that. > > Case in point: imagine you have two subword-sized fields in the same > word, and you do CAS on each of them. From Java level, these CASes are > uncontended, and both should succeed. In our hypothetical > implementation, one of the CASes would fail. This is a good thing for > weakCAS that is spec-ed to allow spurious failures, but not for a strong > one. Thank you for the tutorial, makes perfect sense. Thanx, Paul From dl at cs.oswego.edu Fri Aug 14 17:57:41 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Aug 2015 13:57:41 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <20150814153602.GI3895@linux.vnet.ibm.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> <20150814153602.GI3895@linux.vnet.ibm.com> Message-ID: <55CE2C15.4060706@cs.oswego.edu> On 08/14/2015 11:36 AM, Paul E. McKenney wrote: > Well, if you change your mind about being consumed by analogs of consume, > please see the attached revision of C++ working draft N4321. ;-) Thanks. My proposal to just introduce loadLoadFence(ref) (which further simplifies VarHandle.getDependently(ref)) was based in part on Section 3 (of the last version of N4321 I'd seen), that mentions and dismisses the idea of forcing layer-by-layer use of something similar in extended dependency chains. Without an OS kernel full of prior users/code to deal with, this seems to minimally suffice. Especially since in Java, programmers are more willing to use tools that might help automate tiered fence placement. Yes? -Doug From paulmck at linux.vnet.ibm.com Fri Aug 14 18:17:20 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 11:17:20 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CE2C15.4060706@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> <20150814153602.GI3895@linux.vnet.ibm.com> <55CE2C15.4060706@cs.oswego.edu> Message-ID: <20150814181720.GP3895@linux.vnet.ibm.com> On Fri, Aug 14, 2015 at 01:57:41PM -0400, Doug Lea wrote: > On 08/14/2015 11:36 AM, Paul E. McKenney wrote: > > >Well, if you change your mind about being consumed by analogs of consume, > >please see the attached revision of C++ working draft N4321. ;-) > > Thanks. > > My proposal to just introduce loadLoadFence(ref) (which further > simplifies VarHandle.getDependently(ref)) was based in part > on Section 3 (of the last version of N4321 I'd seen), that > mentions and dismisses the idea of forcing layer-by-layer > use of something similar in extended dependency chains. > Without an OS kernel full of prior users/code to deal > with, this seems to minimally suffice. Especially since in > Java, programmers are more willing to use tools that might help > automate tiered fence placement. Yes? My guess is that this is Section 3.5 ("Linux-Kernel Dependency Chain Length"), the final paragraph of which reads as follows: Again, although a great many dependency chains in the Linux kernel are quite short, there are quite a few that spread both widely and deeply. We therefore cannot expect Linux kernel hackers to look fondly on any mechanism that requires them to decorate each and every operator in each and every dependency chain as was shown in Figure 8. In fact, even kill dependency() will likely be an extremely difficult sell. Given that you don't have existing code and assuming use of fence-placement tools, layer-by-layer decoration might be OK. The tools propagate the layer-by-layer decoration from the head of the dependency chain or some such? Thanx, Paul From boehm at acm.org Fri Aug 14 18:36:55 2015 From: boehm at acm.org (Hans Boehm) Date: Fri, 14 Aug 2015 11:36:55 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: <55CDD61D.80602@cs.oswego.edu> References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> Message-ID: On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea
wrote: > So there are limitations in the ability of ordering control to > improve responsiveness. Which is unsurprising given all the > other limitations under weak scheduling guarantees. But > that's not much of an argument for not even allowing it. I think it's not a limitation; it's just the wrong mechanism. The mechanism you want doesn't care about reordering of memory visibility to other threads. That's what fences are about. It does care about merging of memory operations and reordering with local compute operations. Fences are not about that. As you point out, that's kind of the domain of C-style volatiles, though they really only address the "combining" part. > [Hans:] >> But I think this doesn't have anything to do with fences. > > > Ordering constraints seem intrinsic to the problem at hand. > It's the complement of the main issue in RCU/consume: > "really read this" vs "really write this". I think the RCU constraint is about ordering. It's not "really read this". CSE on two consume loads is OK for correctness, though probably otherwise evil. And my understanding is that you really want to order loads with respect to dependent stores as well. ... > > But even if so, it seems better to have a uniform API: > writes: full > release > storeStore > reads: full > acquire > loadLoad > even if loadLoad is internally mapped to acquire. > And omitting it feels even more wrong if we support > RCU-like usages with scoped loadLoadFence(Object ref). C++ has a great solution to that: include neither storeStore nor loadLoad. As I argued in my earlier message, storeStore really only makes sense for ordering prior writes to fields that are subsequently treated as read-only. That's a really narrow application domain but probably a disproportional security bug magnet (for those people who care about Java security). If we were to generalize the final field ordering guarantees to other fields, I would argue that storeStore is no longer a safe implementation of the constructor barrier. People would "naively" expect that if they just wrote a field in a constructor, and the object has not yet been published, it should read back as the value they just wrote. This becomes an unsafe assumption if storeStore is used. My impression is that the "st" variant of "dmb" is only a win on a smallish number of microarchitectures. Someone from ARM should confirm or deny. The more I think about it, the less confident I am that the use of "dmb ishst" is actually correct as a constructor barrier to start with. Consider: Thread 1: t1.f = 1; // Final field write in constructor // Implicit constructor StoreStore fence ("dmb ishst") here. a = t1; // Correctly publish outside constructor Thread 2: t2.f2 = a; // Final field write in constructor // Implicit constructor StoreStore fence b = t2; // Correctly publish outside constructor Thread 3: if (b != null && b.f2 != null) { t2 = b.f2.f; // Guaranteed to see 1? } Is thread 3 guaranteed to see an initialized f? Based on the ARM spec, I'm not sure either way. In any case, this seems really hard to specify. Yet it's likely that most real users will (or at least should) care. Unlike the OOTA issues where we kind of all know what we mean, I don't think hand-waving works here. AFAICT, storeStore is a feature we don't know how to specify, we're not sure is really useful in correct code, but can potentially be used to improve performance marginally on a few microarchitectures for one ISA. And it's likely to be a bug magnet. That just doesn't seem like a very compelling case. Hans > -Doug > From dl at cs.oswego.edu Fri Aug 14 18:43:34 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Aug 2015 14:43:34 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <20150814181720.GP3895@linux.vnet.ibm.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> <20150814153602.GI3895@linux.vnet.ibm.com> <55CE2C15.4060706@cs.oswego.edu> <20150814181720.GP3895@linux.vnet.ibm.com> Message-ID: <55CE36D6.4090701@cs.oswego.edu> On 08/14/2015 02:17 PM, Paul E. McKenney wrote: > On Fri, Aug 14, 2015 at 01:57:41PM -0400, Doug Lea wrote: >> My proposal to just introduce loadLoadFence(ref) (which further >> simplifies VarHandle.getDependently(ref)) was based in part >> on Section 3 (of the last version of N4321 I'd seen), that >> mentions and dismisses the idea of forcing layer-by-layer >> use of something similar in extended dependency chains. >> Without an OS kernel full of prior users/code to deal >> with, this seems to minimally suffice. Especially since in >> Java, programmers are more willing to use tools that might help >> automate tiered fence placement. Yes? > > Given that you don't have existing code and assuming use of > fence-placement tools, layer-by-layer decoration might be OK. > > The tools propagate the layer-by-layer decoration from the head > of the dependency chain or some such? > Well, no such tools exist yet. But they seem to be possible. And there are enough performance-obsessed low-level java programmers that I expect interest in RCU-like techniques to grow sometime. As mentioned in my post Wednesday, these usages are basically the same as Java final rules except that the fences must be explicit. So in principle a tool can insert using a variant of final-field-check rules -- not a known variant, but I don't see any serious obstacles. -Doug From paulmck at linux.vnet.ibm.com Fri Aug 14 18:54:27 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 11:54:27 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55CE36D6.4090701@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55CA48CE.6050405@cs.oswego.edu> <20150814153602.GI3895@linux.vnet.ibm.com> <55CE2C15.4060706@cs.oswego.edu> <20150814181720.GP3895@linux.vnet.ibm.com> <55CE36D6.4090701@cs.oswego.edu> Message-ID: <20150814185427.GQ3895@linux.vnet.ibm.com> On Fri, Aug 14, 2015 at 02:43:34PM -0400, Doug Lea wrote: > On 08/14/2015 02:17 PM, Paul E. McKenney wrote: > >On Fri, Aug 14, 2015 at 01:57:41PM -0400, Doug Lea wrote: > >>My proposal to just introduce loadLoadFence(ref) (which further > >>simplifies VarHandle.getDependently(ref)) was based in part > >>on Section 3 (of the last version of N4321 I'd seen), that > >>mentions and dismisses the idea of forcing layer-by-layer > >>use of something similar in extended dependency chains. > >>Without an OS kernel full of prior users/code to deal > >>with, this seems to minimally suffice. Especially since in > >>Java, programmers are more willing to use tools that might help > >>automate tiered fence placement. Yes? > > > >Given that you don't have existing code and assuming use of > >fence-placement tools, layer-by-layer decoration might be OK. > > > >The tools propagate the layer-by-layer decoration from the head > >of the dependency chain or some such? > > Well, no such tools exist yet. But they seem to be possible. > And there are enough performance-obsessed low-level java > programmers that I expect interest in RCU-like techniques > to grow sometime. As mentioned in my post Wednesday, these > usages are basically the same as Java final rules except that > the fences must be explicit. So in principle a tool can insert > using a variant of final-field-check rules -- not a known > variant, but I don't see any serious obstacles. And in any case, you do have the option of providing this mechanism, and expanding upon it later when/if needed. Thanx, Paul From paulmck at linux.vnet.ibm.com Fri Aug 14 19:09:07 2015 From: paulmck at linux.vnet.ibm.com (Paul E. McKenney) Date: Fri, 14 Aug 2015 12:09:07 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> Message-ID: <20150814190907.GS3895@linux.vnet.ibm.com> On Fri, Aug 14, 2015 at 11:36:55AM -0700, Hans Boehm wrote: > On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea
wrote: > > > So there are limitations in the ability of ordering control to > > improve responsiveness. Which is unsurprising given all the > > other limitations under weak scheduling guarantees. But > > that's not much of an argument for not even allowing it. > I think it's not a limitation; it's just the wrong mechanism. > The mechanism you want doesn't care about reordering of > memory visibility to other threads. That's what fences are > about. It does care about merging of memory operations and > reordering with local compute operations. Fences are not about that. > As you point out, that's kind of the domain of C-style volatiles, > though they really only address the "combining" part. > > > > [Hans:] > >> But I think this doesn't have anything to do with fences. > > > > > > Ordering constraints seem intrinsic to the problem at hand. > > It's the complement of the main issue in RCU/consume: > > "really read this" vs "really write this". > I think the RCU constraint is about ordering. It's not "really read this". > CSE on two consume loads is OK for correctness, though > probably otherwise evil. And my understanding is that > you really want to order loads with respect to dependent stores > as well. Indeed, if the code faces severe response-time constraints, its developers and users won't thank you for common subexpression elimination on a pair of consume loads (even assuming that the users are in a position to express any opinion at all after the dust settles). However, it could actually be a very good thing if the code had throughput and scalability requirements, but no pressing response-time requirements. As you no doubt guessed, given current C and C++ compilers, my recommendation would be to mark respond-time-constrained consume loads as "volatile", unpopular though that recommendation appears to be in many quarters. ;-) Thanx, Paul From dl at cs.oswego.edu Sat Aug 15 00:14:51 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 14 Aug 2015 20:14:51 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> Message-ID: <55CE847B.7090603@cs.oswego.edu> On 08/14/2015 02:36 PM, Hans Boehm wrote: > On Fri, Aug 14, 2015 at 4:50 AM, Doug Lea
> wrote: > > So there are limitations in the ability of ordering control to > > improve responsiveness. Which is unsurprising given all the > > other limitations under weak scheduling guarantees. But > > that's not much of an argument for not even allowing it. > I think it's not a limitation; it's just the wrong mechanism. > The mechanism you want doesn't care about reordering of > memory visibility to other threads. That's what fences are I suppose we should stop arguing about terminology/domains but ... Ensuring that the writes are not reordered (as seen by other threads) seems to be a prerequisite in all the cases under consideration here. So at least some of us conclude that solutions should build off this (fence) constraint. > As you point out, that's kind of the domain of C-style volatiles, > though they really only address the "combining" part. The use of (C) volatile casts amounts to telling the compiler: "I cannot tell you why to write/read this variable; just do it". If that's the best available solution, in Java, we could add analogous VarHandle methods of the form: void setOpaquely(Object owner, T x); T getOpaquely(Object owner); I don't like this at all. But to defend anyway: they do form a sort of matched set with the reachabilityFence implementation as "useOpaquely(ref)". And hotspot has an analogous internal (misnamed) construct MemBarCPUOrder that means: "I cannot tell you why not to reorder these IR nodes; just don't". This would come into play in the implementation of these methods as the "solution" to the lack of explicit storeStore fence. So this is entirely workable even if entirely unappealing. > > > > But even if so, it seems better to have a uniform API: > > writes: full > release > storeStore > > reads: full > acquire > loadLoad > > even if loadLoad is internally mapped to acquire. > > And omitting it feels even more wrong if we support > > RCU-like usages with scoped loadLoadFence(Object ref). > > C++ has a great solution to that: include neither storeStore > nor loadLoad. Some history: In jdk8, after a similar discussion, we added only the three C++ fences as internal Unsafe intrinsics (which in turn map to existing MemBar IR nodes inside hotspot). But some JVM engineers voted with their feet about these: They introduced internal storeStore so they could use it (and/or directly generate) instead of release when applicable. The fact that it is cheaper or no more expensive than alternatives on most if not all ARM processors is surely the main reason for use. But other use cases are not just motivated by cost. It would be nice to avoid similar problems with jdk9 public APIs. (This accounts for my choice of words in initial post that it "seems wise" to include them.) > AFAICT, storeStore is a feature we don't know how to specify, Again, the spec is conceptually easy: Of all the ordering effects allowed by the base memory model rules, disable the indicated case of store-store reordering. The hard part is specifying the conditions under which usage does what any given programmer expects. Moreso than other fences. But this does not seem to be a good enough reason to omit from API. -Doug From boehm at acm.org Sat Aug 15 01:14:32 2015 From: boehm at acm.org (Hans Boehm) Date: Fri, 14 Aug 2015 18:14:32 -0700 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: <55CE847B.7090603@cs.oswego.edu> References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> <55CE847B.7090603@cs.oswego.edu> Message-ID: On Fri, Aug 14, 2015 at 5:14 PM, Doug Lea
wrote: > > I suppose we should stop arguing about terminology/domains but ... > Ensuring that the writes are not reordered (as seen > by other threads) seems to be a prerequisite in all the > cases under consideration here. So at least some of us > conclude that solutions should build off this (fence) constraint. You're now talking about preventing reordering of accesses to the same variable? I do agree that such a fence could conceivably make sense in Java, though I'm not sure whether there are enough use cases to justify them. And I suspect you would want a generic sameVariableFence rather than distinguishing between loads and stores. This doesn't make sense in C++ since, thanks largely to Paul, memory_order_relaxed accesses are implicitly cache coherent. I also still think this issue is quite distinct from the problem we started with, as evidenced by the problem that merging of variable accesses is definitely still an issue in C++, and sameVariableFence wouldn't prevent it. > > On 08/14/2015 02:36 PM, Hans Boehm wrote: >> >> As you point out, that's kind of the domain of C-style volatiles, >> though they really only address the "combining" part. > > > The use of (C) volatile casts amounts to telling the compiler: > "I cannot tell you why to write/read this variable; just > do it". If that's the best available solution, in Java, we could > add analogous VarHandle methods of the form: > void setOpaquely(Object owner, T x); > T getOpaquely(Object owner); > > I don't like this at all. But to defend anyway: they do form a > sort of matched set with the reachabilityFence > implementation as "useOpaquely(ref)". And hotspot has an > analogous internal (misnamed) construct MemBarCPUOrder that means: > "I cannot tell you why not to reorder these IR nodes; just don't". > This would come into play in the implementation of these methods > as the "solution" to the lack of explicit storeStore fence. > So this is entirely workable even if entirely unappealing. In C and C++ we could somewhat define such constructs in terms of the behavior of asynchronous signal handlers running in the same thread. We do that for async_signal_fence, which is essentially intended to be a compiler fence. I don't think there's really a Java equivalent. I think the actual definition of "volatile" in C and C++ is entirely unsatisfactory. The real definition has become "it does whatever it used to do on this architecture". C has enough history in this area that that makes sense. That's not a Java solution, though I would prefer some kind of "compiler don't muck with this" access to loadLoad or storeStore fences. I'd personally prefer to spell it differently, e.g. as an @Races variable annotation. > >> > >> > But even if so, it seems better to have a uniform API: >> > writes: full > release > storeStore >> > reads: full > acquire > loadLoad >> > even if loadLoad is internally mapped to acquire. >> > And omitting it feels even more wrong if we support >> > RCU-like usages with scoped loadLoadFence(Object ref). >> >> C++ has a great solution to that: include neither storeStore >> nor loadLoad. > > > Some history: In jdk8, after a similar discussion, we added only > the three C++ fences as internal Unsafe intrinsics (which in turn > map to existing MemBar IR nodes inside hotspot). But some JVM > engineers voted with their feet about these: They introduced > internal storeStore so they could use it (and/or directly > generate) instead of release when applicable. The fact that > it is cheaper or no more expensive than alternatives on most if not > all ARM processors is surely the main reason for use. But > other use cases are not just motivated by cost. Android has StoreStore internally as well, though I'm starting to wonder whether we should get rid of it, as we did with LoadLoad. Given the uncertainty about correct use cases, I'd certainly prefer to keep it at most internal. I agree that engineers voting with their feet is often a strong argument for adding a feature. But I think those uses need to become with convincing correctness arguments and reasonable specifications. I think in this case we have neither. In my opinion, there have been plenty of cases of engineers heading into dead ends as well. > > It would be nice to avoid similar problems with jdk9 public APIs. > (This accounts for my choice of words in initial post > that it "seems wise" to include them.) > >> AFAICT, storeStore is a feature we don't know how to specify, > > > Again, the spec is conceptually easy: Of all the ordering > effects allowed by the base memory model rules, disable > the indicated case of store-store reordering. Unfortunately, I think it's not that simple because of transitivity issues. If the ordering doesn't correctly combine with the reads-from ordering, then the expected guarantees from e.g. publication safety go away when more than two threads are involved. And everything is complicated since this only matters for architectures that are not multi-copy atomic and can't be described by simple operation reordering. The question is whether the fact that my load saw a store S before a StoreStore fence means anything about the visibility of S to a third thread. I think that without some guarantee along those lines, this is essentially useless, as in my example in the last message. (Dependencies also figure into that example, and may be critical, but we already know that we don't know how to handle those.) > > The hard part is specifying the conditions under which usage > does what any given programmer expects. Moreso than other fences. > But this does not seem to be a good enough reason to omit from API. I do not know of a definition that's good enough that I would be comfortable using them. Which I think is a good reason for omitting them. Hans > > -Doug > From dl at cs.oswego.edu Sat Aug 15 11:47:01 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Sat, 15 Aug 2015 07:47:01 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> <55CE847B.7090603@cs.oswego.edu> Message-ID: <55CF26B5.3050603@cs.oswego.edu> On 08/14/2015 09:14 PM, Hans Boehm wrote: > thanks largely to Paul, > memory_order_relaxed accesses are implicitly cache coherent. Right. We cannot of course require this for plain Java variables that otherwise correspond to C++ relaxed, so this remains among the minor C++/Java memory-model-related usage differences. Coherence would probably not be desirable in C++ either if some form of opaque access methods were available, but it might be too late to change that. It might be worth contemplating adding C++ memory_order_opaque. rather than relying on ill-defined C-volatile casts. Or you could just rename memory_order_consume! The resulting usages would correspond to opaque Java VarHandle get/set methods, matching compatibility in expressiveness. While conceived as a desperation move in both C++ and Java, there's a sense in which this fits naturally as a fence/mode: They all disallow transformations on reads/writes that are otherwise allowed by the execution model. Finding better names and actual specs might be slightly challenging. The sense of "opaque" here is not obvious to potential users, but the alternatives I've thought of (like program_order) are worse in that they could be misleading. -Doug From dl at cs.oswego.edu Wed Aug 19 11:06:36 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 19 Aug 2015 07:06:36 -0400 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CC8B45.50909@cs.oswego.edu> <55CD2609.3070601@cs.oswego.edu> <55CDD61D.80602@cs.oswego.edu> Message-ID: <55D4633C.5070608@cs.oswego.edu> On 08/14/2015 02:36 PM, Hans Boehm wrote: > The more I think about it, the less confident I am that the use > of "dmb ishst" is actually correct as a constructor barrier to > start with. Or, in other words, is dmb ishst guaranteed to implement any reasonable definition of a storeStoreFence. (Which is a separate issue than whether storeStoreFences are hard to use.) I had not considered this possibility, but we do need to be sure that it applies as expected in cases like: > > Thread 1: > t1.f = 1; // Final field write in constructor > // Implicit constructor StoreStore fence ("dmb ishst") here. > a = t1; // Correctly publish outside constructor > > Thread 2: > t2.f2 = a; // Final field write in constructor > // Implicit constructor StoreStore fence > b = t2; // Correctly publish outside constructor > > Thread 3: > if (b != null && b.f2 != null) { > t2 = b.f2.f; // Guaranteed to see 1? > } > The ARM specs are not completely clear about this, so we've asked them (Stephan Diestelhorst et al) to clarify. If it is the case that the ARM spec is not intended to require it but all current processors have the desired effect, the (disappointing) best course of action seems to be to leave it as non-public within-JDK intrinsic but not to expose as a Fence. -Doug From dl at cs.oswego.edu Wed Aug 19 11:44:58 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 19 Aug 2015 07:44:58 -0400 Subject: [jmm-dev] jdk9 APIs (CAS) In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> Message-ID: <55D46C3A.1050509@cs.oswego.edu> Thanks to Cliff Click for prodding me to realize that the signatures for new analogs of C/C++ moded CAS should, like those versions, return values not booleans (i.e, on failure, the value causing CAS to fail, else the given value). And renamed to compareAndExchange to better convey this. (Only 6 of the possible mode X weakness cases make sense,) Any objections? The current tentative VarHandles API looks like: abstract class NotReallyVarHandle { // Load T getRelaxed(Object owner); T getAcquire(Object owner); T getVolatile(Object owner); T getOpaque((Object owner); // Store void setRelaxed(Object owner, T val); void setRelease(Object owner, T val); void setVolatile(Object owner, T val); void setOpaque(Object owner, T val); // CAS boolean compareAndSet(Object owner, T cmp, T val); // Value-returning compare and exchange T compareAndExchangeVolatile(Object owner, T cmp, T val); T compareAndExchangeAcquire(Object owner, T cmp, T val); T compareAndExchangeRelease(Object owner, T cmp, T val); T weakCompareAndExchangeRelaxed(Object owner, T cmp, T val); T weakCompareAndExchangeAcquire(Object owner, T cmp, T val); T weakCompareAndExchangeRelease(Object owner, T cmp, T val); // special RMW T getAndSet(Object owner, T val); T getAndAdd(Object owner, T delta); T addAndGet(Object owner, T delta); } From stephan.diestelhorst at arm.com Wed Aug 19 12:01:07 2015 From: stephan.diestelhorst at arm.com (Stephan Diestelhorst) Date: Wed, 19 Aug 2015 13:01:07 +0100 Subject: [jmm-dev] jdk9 APIs [Fences specifically] In-Reply-To: References: <55CDD61D.80602@cs.oswego.edu> Message-ID: <1918832.yVbzCkb1mQ@mymac-ubuntu> On Friday 14 August 2015 19:36:55 Hans Boehm wrote: > As I argued in my earlier message, storeStore > really only makes sense for ordering prior writes to fields > that are subsequently treated as read-only. That's a really > narrow application domain but probably a disproportional security > bug magnet (for those people who care about Java security). > > If we were to generalize the final field ordering guarantees > to other fields, I would argue that storeStore is no longer a safe > implementation of the constructor barrier. People would > "naively" expect that if they just wrote a field in a constructor, > and the object has not yet been published, it should read > back as the value they just wrote. This becomes an unsafe > assumption if storeStore is used. > > My impression is that the "st" variant of "dmb" is only a win > on a smallish number of microarchitectures. Someone from > ARM should confirm or deny. > > The more I think about it, the less confident I am that the use > of "dmb ishst" is actually correct as a constructor barrier to > start with. Consider: > > Thread 1: > t1.f = 1; // Final field write in constructor > // Implicit constructor StoreStore fence ("dmb ishst") here. > a = t1; // Correctly publish outside constructor > > Thread 2: > t2.f2 = a; // Final field write in constructor > // Implicit constructor StoreStore fence > b = t2; // Correctly publish outside constructor > > Thread 3: > if (b != null && b.f2 != null) { > t2 = b.f2.f; // Guaranteed to see 1? > } > > Is thread 3 guaranteed to see an initialized f? Based on the > ARM spec, I'm not sure either way. In any case, this seems > really hard to specify. Yet it's likely that most real users will > (or at least should) care. Unlike the OOTA issues where we > kind of all know what we mean, I don't think hand-waving works > here. We have looked at the example above and if you look at the definition of our StoreStore barrier, you will find that it is somewhat involved (and not just core-local!). With that definition under the belt, the example is indeed guaranteed to see the initialised value here. The question on the performance side with the stronger StoreStore barrier is one that we cannot answer publicly, yet. > AFAICT, storeStore is a feature we don't know how to specify, > we're not sure is really useful in correct code, We have a specification out there for a strong-ish StoreStore barrier (at least judged by this example), which effectively not just orders thread-local stores, but also includes order with stores on other CPUs that are "logically" after the StoreStore barrier (through an edge from a load that reads things from after the barrier). That way, these logically dependent stores become globally ordered with respect to the stores before the barrier -- reducing the impact of the absent multi-copy atomicity. > but can potentially be used to improve performance marginally on a few > microarchitectures for one ISA. These barriers may indeed be more expensive than the thread-local push the stores in-order barriers. However, those are complicated to reason about without any additional multi-copy atomicity. -- Sincerely, Stephan Stephan Diestelhorst ARM Research - Systems -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 From dl at cs.oswego.edu Thu Aug 20 12:27:49 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 20 Aug 2015 08:27:49 -0400 Subject: [jmm-dev] jdk9 APIs (CAS) In-Reply-To: <55D46C3A.1050509@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D46C3A.1050509@cs.oswego.edu> Message-ID: <55D5C7C5.8090308@cs.oswego.edu> On 08/19/2015 07:44 AM, Doug Lea wrote: > > Thanks to Cliff Click for prodding me to realize that the signatures > for new analogs of C/C++ moded CAS should, like those versions, return > values not booleans (i.e, on failure, the value causing CAS to fail, > else the given value). And thanks to Andrew Haley for noticing that this is not desirable for the "weak" versions that can spuriously fail. The C++ versions both report boolean and set one of the arguments by-ref to hold value, which we can't do. So I reverted these three (but renamed weakCompareAndSet to weakCompareAndSetRelaxed for clarity). -Doug From dl at cs.oswego.edu Thu Aug 20 13:44:07 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 20 Aug 2015 09:44:07 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55C8EBCD.7010006@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> Message-ID: <55D5D9A7.4050704@cs.oswego.edu> On 08/10/2015 02:22 PM, Doug Lea wrote: > There seems to be only a little room for discussion on the exact sets > of methods in the VarHandle* and Fences classes: I think these are as settled as they need to be for present purposes. Perhaps not loved, but settled. Please let me know if you think otherwise. Current versions are pasted below and updated at http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java Bear in mind that the actual VarHandles class differs due to polymorphic method-handle mechanics. Also it is possible that the methods in "Fences" might end up elsewhere (maybe in the VarHandles utility class) that better fit with Un-Unsafe-ifying (safication?) plans. The javadocs still need some work, but the functionality seems to meet the main goals, mainly of matching C/C++11 atomics in all the "important" cases, providing some way to obtain equivalent effects in remaining cases, and exposing all these in public APIs (vs Unsafe or other weird hacks). And doing so for now without revising the base memory model (thus relying on non-formal specs), but with the belief that they are compatible with future JMM updates. As always, and other comments and suggestions are also welcome. ... /** * Stand-in for spec purposes of jdk9 java.lang.invoke.VarHandle */ abstract class NotReallyVarHandle { // Load /** * Returns the value, with memory semantics of reading a * non-volatile variable. */ T getRelaxed(Object owner); /** * Returns the value, with memory semantics of reading a volatile * variable. */ T getVolatile(Object owner); /** * Returns the value, and ensures that subsequent loads and stores * are not reordered before this access. * * @apiNote Ignoring the many semantic differences from C and * C++, this method has memory ordering effects compatible with * memory_order_acquire ordering. */ T getAcquire(Object owner); /** * Returns the value, accessed in program order, but with no * assurance of memory ordering effects with respect to other * threads. */ T getOpaque(Object owner); // Store /** * Sets the value, with memory semantics of setting a non-volatile * variable. * * @param val the new value */ void setRelaxed(Object owner, T val); /** * Sets the value, and ensures that prior loads and stores are not * reordered after this access. * * @param val the new value */ void setVolatile(Object owner, T val); /** * Sets the value, with memory semantics of setting a volatile * variable. * * @apiNote Ignoring the many semantic differences from C and * C++, this method has memory ordering effects compatible with * memory_order_release ordering. * * @param val the new value */ void setRelease(Object owner, T val); /** * Sets the value, in program order, but with no assurance of * memory ordering effects with respect to other threads. * * @param val the new value */ void setOpaque(Object owner, T val); // CAS /** * Atomically sets the value to the given updated value with the * memory semantics of setVolatile if the current value {@code ==} * the expected value, as accessed with the memory semantics of * getVolatile. * * @param expected the expected value * @param val the new value * @return {@code true} if successful. False return indicates that * the actual value was not equal to the expected value. */ boolean compareAndSet(Object owner, T expected, T val); // Value-returning compare and exchange /** * Atomically sets the value to the given updated value with the * memory semantics of setVolatile if the current value {@code ==} * the expected value, as accessed with the memory semantics of * getVolatile. * * @param expected the expected value * @param val the new value * @return the current value, which will be the same as {@code val} if * successful. */ T compareAndExchangeVolatile(Object owner, T expected, T val); /** * Atomically sets the value to the given updated value with the * memory semantics of setRelaxed if the current value {@code ==} * the expected value, as accessed with the memory semantics of * getAcquire. * * @param expected the expected value * @param val the new value * @return the current value, which will be the same as {@code val} if * successful. */ T compareAndExchangeAcquire(Object owner, T expected, T val); /** * Atomically sets the value to the given updated value with the * memory semantics of setRelease if the current value {@code ==} * the expected value, as accessed with the memory samantics of * getRelaxed. * * @param expected the expected value * @param val the new value * @return the current value, which will be the same as {@code val} if * successful. */ T compareAndExchangeRelease(Object owner, T expected, T val); // Weak (spurious failures allowed) /** * Possibly atomically sets the value to the given updated value * with the semantics of setRelaxed if the current value {@code * ==} the expected value, as as accessed with the memory * semantics of getRelaxed. This operation may fail spuriously * (typically, due to memory contention) even if the current value * does match the expected value. * * @param expected the expected value * @param val the new value * @return {@code true} if successful */ boolean weakCompareAndSetRelaxed(Object owner, T expected, T val); /** * Possibly atomically sets the value to the given updated value * with the memory semantics of setRelaxed if the current value * {@code ==} the expected value, as as accessed with the memory * semantics of getAcquire. This operation may fail spuriously * (typically, due to memory contention) even if the current value * does match the expected value. * * @param expected the expected value * @param val the new value * @return {@code true} if successful */ boolean weakCompareAndSetAcquire(Object owner, T expected, T val); /** * Possibly atomically sets the value to the given updated value * with the memory semantics of setRelease if the current value * {@code ==} the expected value, as as accessed with the memory * semantics of getRelaxed. This operation may fail spuriously * (typically, due to memory contention) even if the current value * does match the expected value. * * @param expected the expected value * @param val the new value * @return {@code true} if successful */ boolean weakCompareAndSetRelease(Object owner, T expected, T val); // special RMW /** * Atomically sets to the given value with the memory semantics of * setVolatile and returns the old value. * * @param newValue the new value * @return the previous value */ T getAndSet(Object owner, T val); /** * Atomically adds the given value to the current value with the * memory semantics of setVolatile. * * @param delta the value to add * @return the previous value */ T getAndAdd(Object owner, T delta); /** * Atomically adds the given value to the current value with the * memory semantics of setVolatile. * * @param delta the value to add * @return the current value */ T addAndGet(Object owner, T delta); } /** * A set of methods providing fine-grained control of memory ordering. * *

The Java Language Specification permits operations to be * executed in orders different than are apparent in program source * code, subject to constraints mainly arising from the use of locks * and volatile fields. The methods of this class can also be used to * impose constraints. Their specifications are phrased in terms of * the lack of "reorderings" -- observable ordering effects that might * otherwise occur if the fence were not present. * * @apiNote More precise phrasing of these specifications may * accompany future updates of the Java Language Specification. */ public class Fences { /** * Ensures that loads and stores before the fence will not be * reordered with loads and stores after the fence. * * @apiNote Ignoring the many semantic differences from C and * C++, this method has memory ordering effects compatible with * atomic_thread_fence(memory_order_seq_cst) */ public static void fullFence() {} /** * Ensures that loads before the fence will not be reordered with * loads and stores after the fence. * * @apiNote Ignoring the many semantic differences from C and * C++, this method has memory ordering effects compatible with * atomic_thread_fence(memory_order_acquire) */ public static void acquireFence() {} /** * Ensures that loads and stores before the fence will not be * reordered with stores after the fence. * * @apiNote Ignoring the many semantic differences from C and * C++, this method has memory ordering effects compatible with * atomic_thread_fence(memory_order_release) */ public static void releaseFence() {} /** * Ensures that loads before the fence will not be reordered with * loads after the fence. */ public static void loadLoadFence() {} /** * Ensures that stores before the fence will not be reordered with * stores after the fence. */ public static void storeStoreFence() {} } class java.lang.ref.Reference { // add: /** * Ensures that the object referenced by the given reference * remains strongly reachable (as defined in the {@link * java.lang.ref} package documentation), regardless of any prior * actions of the program that might otherwise cause the object to * become unreachable; thus, the referenced object is not * reclaimable by garbage collection at least until after the * invocation of this method. Invocation of this method does not * itself initiate garbage collection or finalization. * * @param ref the reference. If null, this method has no effect. */ public static void reachabilityFence(Object ref) {} } From adinn at redhat.com Thu Aug 20 13:59:39 2015 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 20 Aug 2015 14:59:39 +0100 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55D5D9A7.4050704@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> Message-ID: <55D5DD4B.6040108@redhat.com> On 20/08/15 14:44, Doug Lea wrote: . . . > /** > * Atomically sets the value to the given updated value with the > * memory semantics of setRelease if the current value {@code ==} > * the expected value, as accessed with the memory samantics of > * getRelaxed. ^^^^^^^^^^ > * @param expected the expected value > * @param val the new value > * @return the current value, which will be the same as {@code val} if > * successful. > */ > T compareAndExchangeRelease(Object owner, T expected, T val); Should that be "with the memory samantics of getRelease"? regards, Andrew Dinn ----------- From dl at cs.oswego.edu Thu Aug 20 14:16:02 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 20 Aug 2015 10:16:02 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55D5DD4B.6040108@redhat.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D5DD4B.6040108@redhat.com> Message-ID: <55D5E122.8040706@cs.oswego.edu> On 08/20/2015 09:59 AM, Andrew Dinn wrote: > On 20/08/15 14:44, Doug Lea wrote: > . . . >> /** >> * Atomically sets the value to the given updated value with the >> * memory semantics of setRelease if the current value {@code ==} >> * the expected value, as accessed with the memory samantics of >> * getRelaxed. > ^^^^^^^^^^ >> * @param expected the expected value >> * @param val the new value >> * @return the current value, which will be the same as {@code val} if >> * successful. >> */ >> T compareAndExchangeRelease(Object owner, T expected, T val); > > Should that be "with the memory samantics of getRelease"? > Yes, thanks! -Doug From aleksey.shipilev at oracle.com Fri Aug 21 13:54:11 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Fri, 21 Aug 2015 16:54:11 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55D5D9A7.4050704@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> Message-ID: <55D72D83.5020407@oracle.com> On 08/20/2015 04:44 PM, Doug Lea wrote: > On 08/10/2015 02:22 PM, Doug Lea wrote: > T getOpaque(Object owner); > void setOpaque(Object owner, T val); My immediate reaction as the guy who wasn't participating in the original discussion: "Er, I don't like these at all". So, these seem similar to "relaxed", but only they are "hardware-relaxed", not "compiler-relaxed". Doesn't this depart from having a memory model covering both software and hardware behaviors? Doesn't that hurt cross-platform compatibility? Inviting users to use the public APIs that perform functionally differently on different hardware is silently leading them into the platform lock-in. An explanation "do whatever hardware does", while looking more appealing, means the same. It is also seems that (get|set)Opaque behavioral difference can only be observed under race, but AFAIU that can be differentiated against the unlucky timing only when some other causal non-synchronization relationship holds (e.g. during object publication). So, if that is only to help the publication/consume cases, should we just rely on already implemented guarantees for final fields (and possibly revisit the idea of making all initialization stores "final", and then emit the barrier implicitly -- thus covering the corner case of volatile fields initialized in objects), rendering the producer-consumer problems a moot for a while? > T compareAndExchangeVolatile(Object owner, T expected, T val); > T compareAndExchangeAcquire(Object owner, T expected, T val); > T compareAndExchangeRelease(Object owner, T expected, T val); Um, I have a stupid question. What's the story with non-x86 processors here? Do we have any machines that do not have an explicit CAS, but only have LL/SC? Returning the failure witness does not seem doable with LL/SC-emulated CAS. Thanks, -Aleksey From dl at cs.oswego.edu Fri Aug 21 17:54:43 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Fri, 21 Aug 2015 13:54:43 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55D72D83.5020407@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> Message-ID: <55D765E3.4070704@cs.oswego.edu> On 08/21/2015 09:54 AM, Aleksey Shipilev wrote: > On 08/20/2015 04:44 PM, Doug Lea wrote: >> On 08/10/2015 02:22 PM, Doug Lea wrote: >> T getOpaque(Object owner); >> void setOpaque(Object owner, T val); > > My immediate reaction as the guy who wasn't participating in the > original discussion: "Er, I don't like these at all". With luck, this will be the universal reaction. First you hate it, then you discover uncommon cases in which you need it, then you begrudgingly accept it. Maybe we can make a video. For most people, the only thing to know about opaque is that it fulfills the same role as the C cast-to-volatile escape hatch, which is also justifiably unloved, but still used for example in the Linux kernel. > So, these seem > similar to "relaxed", but only they are "hardware-relaxed", not > "compiler-relaxed". They basically say: even if other JMM rules would allow the read or write to be optimized away, don't. They feel like compiler-only directives only because hardware never elides unbounded numbers of reads/writes. > > It is also seems that (get|set)Opaque behavioral difference can only be > observed under race, Yes but use of VarHandle methods implicitly indicates "might be racy". (So we don't need a separate annotation for this). > > So, if that is only to help the publication/consume cases, should we > just rely on already implemented guarantees... There is no sign that the series of special-cases will ever end. We should try to understand and support explicit transformation constraints when possible. But so far the record is not all that great. So the only good pragmatic response I know is to include opaque. >> T compareAndExchangeVolatile(Object owner, T expected, T val); >> T compareAndExchangeAcquire(Object owner, T expected, T val); >> T compareAndExchangeRelease(Object owner, T expected, T val); > > Um, I have a stupid question. What's the story with non-x86 processors > here? Do we have any machines that do not have an explicit CAS, but only > have LL/SC? Returning the failure witness does not seem doable with > LL/SC-emulated CAS. With LL/SC, you loop until either compare-failure with the LL'ed value or success of the SC. So either way there is a unique answer. -Doug From aph at redhat.com Sat Aug 22 08:43:32 2015 From: aph at redhat.com (Andrew Haley) Date: Sat, 22 Aug 2015 09:43:32 +0100 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55D72D83.5020407@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> Message-ID: <55D83634.6070303@redhat.com> On 08/21/2015 02:54 PM, Aleksey Shipilev wrote: > Um, I have a stupid question. What's the story with non-x86 processors > here? Do we have any machines that do not have an explicit CAS, but only > have LL/SC? Yes, we do. > Returning the failure witness does not seem doable with > LL/SC-emulated CAS. In fact it's the other way around: it's difficult (or, for all practical purposes, impossible) to emulate LL/SC with CAS, but easy to emulate CAS with LL/SC. Andrew. From aleksey.shipilev at oracle.com Mon Aug 24 08:42:00 2015 From: aleksey.shipilev at oracle.com (Aleksey Shipilev) Date: Mon, 24 Aug 2015 11:42:00 +0300 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55D765E3.4070704@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> Message-ID: <55DAD8D8.8070508@oracle.com> On 08/21/2015 08:54 PM, Doug Lea wrote: > On 08/21/2015 09:54 AM, Aleksey Shipilev wrote: >>> T compareAndExchangeVolatile(Object owner, T expected, T val); >>> T compareAndExchangeAcquire(Object owner, T expected, T val); >>> T compareAndExchangeRelease(Object owner, T expected, T val); >> >> Um, I have a stupid question. What's the story with non-x86 processors >> here? Do we have any machines that do not have an explicit CAS, but only >> have LL/SC? Returning the failure witness does not seem doable with >> LL/SC-emulated CAS. > > With LL/SC, you loop until either compare-failure with the LL'ed value > or success of the SC. So either way there is a unique answer. Oh, right. I was under (false) impression that you do LL/SC without looping, but I guess that is not really viable for strong CAS emulation since SC may "spuriously" fail. Thanks, -Aleksey From boehm at acm.org Tue Aug 25 00:01:25 2015 From: boehm at acm.org (Hans Boehm) Date: Mon, 24 Aug 2015 17:01:25 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55DAD8D8.8070508@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> Message-ID: Some more comments: I think the getAnd() operations also need to specify that the load part is volatile, so that an earlier volatile store synchronizes with the call. The "reordering"-based specifications are extremely problematic. They give no hint that, for example, IRIW can produce non-SC results with LoadLoad or Acquire fences. But I think that has to be allowed for anything other than fullFence() to be useful. Note that the use of "Relaxed" here is seriously weaker than in C & C++. It doesn't promise cache-coherence. That's a sufficiently serious difference that I'm not sure I would use the same word. For example, a field that is only ever atomically incremented can appear to decrease, where it can't in C or C++. Of course that term already has a long history of not-very-consistent use in the hardware community, so maybe it's OK. Is it intended to be possible to apply these methods to built-in types like long? If so, does Relaxed provide word-tearing guarantees? As you've probably gathered, I'm not volunteering to define semantics of loadLoadFence() or storeStoreFence(). reachabilityFence: "regardless of any prior actions of the program that might otherwise cause the object to become unreachable" doesn't sound right to me. It really revokes the compilers license to "misoptimize" the program normally given by 12.6.1: "Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable." The real guarantee seems to be that reachabilityFence(ref) happens before ref is finalized or added to a java.lang.ref queue. Can we include that in its definition? This would normally mean that (1) either reachabilityFence(ref) or the next safe point (or something in between) acts as a release operation for compiler purposes, and (2) the next safe point synchronizes with any GC that might reclaim ref. I can't think of a way to write a correct GC that doesn't make (2) or some analogous condition true automatically. But I think we need to be clear that (1) is expected. Presumably it's too late to add an annotation? The simplest would probably be an @FinalizationSensitive class annotation that causes the compiler to insert a reachabilityFence at the end of the scope of every reference with that static type or a subclass type. (Or after the current statement for temporaries.) ReachabilityFence() would help us for library code, but I doubt it's going to get much use in user-written code. Especially the temporary case is likely to be missed. You need to transform nativeCall(foo.getFinalizableObj().nativePtr) to T tmp = foo.getFinalizableObj(); nativeCall(tmp.nativePtr); reachabilityFence(tmp); Hans On Mon, Aug 24, 2015 at 1:42 AM, Aleksey Shipilev < aleksey.shipilev at oracle.com> wrote: > On 08/21/2015 08:54 PM, Doug Lea wrote: > > On 08/21/2015 09:54 AM, Aleksey Shipilev wrote: > >>> T compareAndExchangeVolatile(Object owner, T expected, T val); > >>> T compareAndExchangeAcquire(Object owner, T expected, T val); > >>> T compareAndExchangeRelease(Object owner, T expected, T val); > >> > >> Um, I have a stupid question. What's the story with non-x86 processors > >> here? Do we have any machines that do not have an explicit CAS, but only > >> have LL/SC? Returning the failure witness does not seem doable with > >> LL/SC-emulated CAS. > > > > With LL/SC, you loop until either compare-failure with the LL'ed value > > or success of the SC. So either way there is a unique answer. > > Oh, right. I was under (false) impression that you do LL/SC without > looping, but I guess that is not really viable for strong CAS emulation > since SC may "spuriously" fail. > > Thanks, > -Aleksey > > > From aph at redhat.com Tue Aug 25 08:45:05 2015 From: aph at redhat.com (Andrew Haley) Date: Tue, 25 Aug 2015 09:45:05 +0100 Subject: [jmm-dev] jdk9 APIs In-Reply-To: References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> Message-ID: <55DC2B11.7020509@redhat.com> On 08/25/2015 01:01 AM, Hans Boehm wrote: > Note that the use of "Relaxed" here is seriously weaker than in C & > C++. It doesn't promise cache-coherence. That's a sufficiently > serious difference that I'm not sure I would use the same word. For > example, a field that is only ever atomically incremented can appear > to decrease, where it can't in C or C++. Of course that term > already has a long history of not-very-consistent use in the > hardware community, so maybe it's OK. I don't think it's OK. This is so madly counter-intuitive that we should forbid non-coherent updates. And I can think of no reason we should be weaker than C and C++; the last time I heard we were trying to be closer. Are there any non-coherent machines we care about? > Presumably it's too late to add an annotation? The simplest would > probably be an @FinalizationSensitive class annotation that causes > the compiler to insert a reachabilityFence at the end of the scope > of every reference with that static type or a subclass type. (Or > after the current statement for temporaries.) I think that's a great idea, although I recall we discussed it before and rejected it. I don't know why. I'm wary of "annotations" with such significant semantics, though. As a language feature it's a bit, er, yucky; surely it should be a part of the declaration. But an annotation is reliable and robust, and much more so than a method you'd have to remember to invoke everywhere; such a thing would be a maintenance problem. Having said that, there is no reason we should not add reachabilityFence today and plan for the annotation to be a later addition. Andrew. From stephan.diestelhorst at arm.com Tue Aug 25 09:53:56 2015 From: stephan.diestelhorst at arm.com (Stephan Diestelhorst) Date: Tue, 25 Aug 2015 10:53:56 +0100 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55DAD8D8.8070508@oracle.com> References: <55C8EBCD.7010006@cs.oswego.edu> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> Message-ID: <1939744.JyTAqgXLQL@mymac-ubuntu> On Monday 24 August 2015 09:42:00 Aleksey Shipilev wrote: > On 08/21/2015 08:54 PM, Doug Lea wrote: > > > On 08/21/2015 09:54 AM, Aleksey Shipilev wrote: > > > >>> T compareAndExchangeVolatile(Object owner, T expected, T val); > >>> T compareAndExchangeAcquire(Object owner, T expected, T val); > >>> T compareAndExchangeRelease(Object owner, T expected, T val); > >> > >> > >> > >> Um, I have a stupid question. What's the story with non-x86 processors > >> here? Do we have any machines that do not have an explicit CAS, but only > >> have LL/SC? Returning the failure witness does not seem doable with > >> LL/SC-emulated CAS. > > > > > > With LL/SC, you loop until either compare-failure with the LL'ed value > > or success of the SC. So either way there is a unique answer. > > > Oh, right. I was under (false) impression that you do LL/SC without > looping, but I guess that is not really viable for strong CAS emulation > since SC may "spuriously" fail. Note that, despite the loop, we guarantee lock-freedom: "In the event of repeatedly-contending Load-Exclusive/Store-Exclusive instruction sequences from multiple PEs, an implementation must ensure that forward progress is made by at least one PE." ARM ARM v8 -- Sincerely, Stephan Stephan Diestelhorst Staff Engineer, ARM Research - Systems +44 (0)1223 405662 -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 From dl at cs.oswego.edu Tue Aug 25 13:22:51 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Tue, 25 Aug 2015 09:22:51 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> Message-ID: <55DC6C2B.3030807@cs.oswego.edu> On 08/24/2015 08:01 PM, Hans Boehm wrote: > Some more comments: Thanks! > > I think the getAnd() operations also need to specify that the load part is > volatile, so that an earlier volatile store synchronizes with the call. Thanks; fixed. > > The "reordering"-based specifications are extremely problematic. They give no > hint that, for example, IRIW can produce non-SC results with LoadLoad or Acquire > fences. But I think that has to be allowed for anything other than fullFence() > to be useful. Yes. It is hard to be clear without accidentally revising the JMM. We once (in previous efforts) came up with some wording to help explain. I'll look for it and try to adapt. > > Note that the use of "Relaxed" here is seriously weaker than in C & C++. It > doesn't promise cache-coherence. That's a sufficiently serious difference that > I'm not sure I would use the same word. And plain (non-volatile, non-final) Java vars are also not exactly the same as in C/C++. I and others have gone back and forth on using "getPlain" (etc) vs "getRelaxed" here. Using getPlain sounds odder but might be a better choice. I suppose we could even have both getPlain and getRelaxed, but it seems better just to add an explanation somewhere about how to obtain coherence if you need it. > > Is it intended to be possible to apply these methods to built-in types like > long? If so, does Relaxed provide word-tearing guarantees? Yes. This should be indicated in class-level docs that don't currently exist because NotReallyVarHandle doesn't really exist :-) We had discussed and seemed headed to removing this disclaimer for all Java primitive vars, but for now we can at least do so for those accessed with VarHandles. > reachabilityFence: > > "regardless of any prior actions of the program that might otherwise cause the > object to become unreachable" doesn't sound right to me. It really revokes the > compilers license to "misoptimize" the program normally given by 12.6.1: > > "Optimizing transformations of a program can be designed that reduce the number > of objects that are reachable to be less than those which would naively be > considered reachable." > > The real guarantee seems to be that reachabilityFence(ref) happens before ref is > finalized or added to a java.lang.ref queue. Can we include that in its definition? > Will do; thanks, > > Presumably it's too late to add an annotation? Not too late, but a separable effort that I'd like to encourage people to undertake. We laid the groundwork on this list last August. Carrying it out seems to need involvement of compiler and IDE developers. -Doug From boehm at acm.org Tue Aug 25 17:09:43 2015 From: boehm at acm.org (Hans Boehm) Date: Tue, 25 Aug 2015 10:09:43 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55DC6C2B.3030807@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> <55DC6C2B.3030807@cs.oswego.edu> Message-ID: Thanks! A couple of further responses: On Tue, Aug 25, 2015 at 6:22 AM, Doug Lea

wrote: > On 08/24/2015 08:01 PM, Hans Boehm wrote: > >> ... > > Note that the use of "Relaxed" here is seriously weaker than in C & C++. >> It >> doesn't promise cache-coherence. That's a sufficiently serious >> difference that >> I'm not sure I would use the same word. >> > > And plain (non-volatile, non-final) Java vars are also not exactly the same > as in C/C++. I and others have gone back and forth on using "getPlain" > (etc) > vs "getRelaxed" here. Using getPlain sounds odder but might be a better > choice. > > I suppose we could even have both getPlain and getRelaxed, but it seems > better just to add an explanation somewhere about how to obtain coherence > if you need it. I think that's hard to do portably, right? Acquire/release should work since they are defined analogously to C++, which requires coherence. But they're clearly substantial overkill, e.g. on Power or ARMv8 (or v7 without the erratum). > >> Is it intended to be possible to apply these methods to built-in types >> like >> long? If so, does Relaxed provide word-tearing guarantees? >> > > Yes. This should be indicated in class-level docs that don't > currently exist because NotReallyVarHandle doesn't really exist :-) > We had discussed and seemed headed to removing this disclaimer > for all Java primitive vars, but for now we can at least do so for > those accessed with VarHandles. I now think that earlier discussion was premature, but it makes sense to remove the disclaimer here. Removing the disclaimer in general is expensive and probably not feasible on 32-bit MIPS and some older ARM processors. Hans From dl at cs.oswego.edu Wed Aug 26 12:01:34 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Wed, 26 Aug 2015 08:01:34 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> <55DC6C2B.3030807@cs.oswego.edu> Message-ID: <55DDAA9E.3080407@cs.oswego.edu> On 08/25/2015 01:09 PM, Hans Boehm wrote: > On Tue, Aug 25, 2015 at 6:22 AM, Doug Lea
> wrote: > And plain (non-volatile, non-final) Java vars are also not exactly the same > as in C/C++. I and others have gone back and forth on using "getPlain" (etc) > vs "getRelaxed" here. Using getPlain sounds odder but might be a better choice. > I'm still torn between odd-soundingness vs reducing chances of confusion. Does anyone else have an opinion? > I suppose we could even have both getPlain and getRelaxed, but it seems > better just to add an explanation somewhere about how to obtain coherence > if you need it. > > I think that's hard to do portably, right? Acquire/release should work since they > are defined analogously to C++, which requires coherence. But they're clearly > substantial overkill, e.g. on Power or ARMv8 (or v7 without the erratum). These seem to be variants of the issues in our write-combining discussion last week. And maybe not even variants: Of the cases I can recall being discussed wrt per-variable coherence, I'm not sure which of the choices of setOpaque, setRelease, releaseFence, or storeStoreFence (with symmetric choices on the load side) people had in mind. > Yes. This should be indicated in class-level docs that don't > currently exist because NotReallyVarHandle doesn't really exist :-) > We had discussed and seemed headed to removing this disclaimer > for all Java primitive vars, but for now we can at least do so for > those accessed with VarHandles. > > I now think that earlier discussion was premature, but it makes sense > to remove the disclaimer here. Removing the disclaimer in general is expensive > and probably not feasible on 32-bit MIPS and some older ARM processors. > OK. In fact, this (banning word-tearing only via VH) might not a bad move even when we do revise JMM/JLS. -Doug From dl at cs.oswego.edu Thu Aug 27 11:30:52 2015 From: dl at cs.oswego.edu (Doug Lea) Date: Thu, 27 Aug 2015 07:30:52 -0400 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55DDAA9E.3080407@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> <55DC6C2B.3030807@cs.oswego.edu> <55DDAA9E.3080407@cs.oswego.edu> Message-ID: <55DEF4EC.7090608@cs.oswego.edu> On 08/26/2015 08:01 AM, Doug Lea wrote: > On 08/25/2015 01:09 PM, Hans Boehm wrote: >> On Tue, Aug 25, 2015 at 6:22 AM, Doug Lea
> > wrote: >> And plain (non-volatile, non-final) Java vars are also not exactly the same >> as in C/C++. I and others have gone back and forth on using "getPlain" (etc) >> vs "getRelaxed" here. Using getPlain sounds odder but might be a better >> choice. > > I'm still torn between odd-soundingness vs reducing chances of confusion. > Does anyone else have an opinion? Or, if you can't say anything nice, don't say anything at all. Just calling them "get" and "set" works out OK in terms of specs (a few other method names/specs need to be reworked a bit), and doesn't seem to be overly abuse-prone. So is probably the best approach. See updated http://gee.cs.oswego.edu/dl/wwwtmp/Fodder.java This would not accommodate a suggestion on valhalla list to automatically equate get/set to relaxed vs volatile versions depending on the declaration of the field. Which is arguable anyway because it would require tracking down the declaration site to determine memory semantics. We'd definitely want to put out a (separate) document showing, for each C/C++11 construct, how to obtain compatible effects in Java, and vice versa. The common cases are obvious, but the fact that Java plain variables are similar to but not exactly the same as either C++ plain or C++ relaxed leads to a few minor usage differences. -Doug From boehm at acm.org Thu Aug 27 17:06:40 2015 From: boehm at acm.org (Hans Boehm) Date: Thu, 27 Aug 2015 10:06:40 -0700 Subject: [jmm-dev] jdk9 APIs In-Reply-To: <55DEF4EC.7090608@cs.oswego.edu> References: <55C8EBCD.7010006@cs.oswego.edu> <55D5D9A7.4050704@cs.oswego.edu> <55D72D83.5020407@oracle.com> <55D765E3.4070704@cs.oswego.edu> <55DAD8D8.8070508@oracle.com> <55DC6C2B.3030807@cs.oswego.edu> <55DDAA9E.3080407@cs.oswego.edu> <55DEF4EC.7090608@cs.oswego.edu> Message-ID: On Thu, Aug 27, 2015 at 4:30 AM, Doug Lea
wrote: > > > This would not accommodate a suggestion on valhalla list to automatically > equate get/set to relaxed vs volatile versions depending on the declaration > of the field. Which is arguable anyway because it would require tracking down > the declaration site to determine memory semantics. Purely from a programming perspective, I don't think this is what you want anyway. I think one of the more common, and unusually safe, uses of memory_order_relaxed in C++ is for non-racing accesses to atomic variables, e.g. the second "check" in double-checked locking, or for reads of atomic variables that are, in a sense, only hints and not trusted, e.g. the load usually preceding a CAS. You really sometimes want plain gets on volatile data.