RFR (S): 8165859: gcTaskThread is missing volatile qualifier and barriers for _time_stamps
kim.barrett at oracle.com
Tue Sep 20 19:47:03 UTC 2016
> On Sep 20, 2016, at 10:36 AM, Carsten Varming <varming at gmail.com> wrote:
> Dear Erik,
> See inline.
> On Tue, Sep 20, 2016 at 4:49 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
>> I believe Kim is referring precisely to the release in the fence before the compare-and-exchange in that comment you refer to, rather than the membar Store* after the fence.
> In this case there is no data, written prior to a successful exchange on _time_stamps, being passed to the reader of _time_stamps, so the fence before the exchange is not needed. In fact, it seems to me that a relaxed atomic cmpxchg (using memory_order_relaxed) is sufficient for the update and a plain read is sufficient to read _time_stamps.
There is the initialization of the contents of the new time_stamps
array. Presently that seems to be nothing (so nothing that I suspect
undefined behavior), but that could/should change.
On the other hand, now that I'm looking at this again, why is this
code using cmpxchg_ptr at all? [Carsten - Thanks for making me
The _time_stamps is per-GCTaskThread. The only potential concurrent
access is in print_task_time_stamps; indeed, there is a comment in
GCTaskThread::run about that, right before the (plain) increment of
_time_stamp_index, indicating the order there is important. If that's
actually true then the plain increment is also wrong, as is the
But it doesn't look to me as if print_task_time_stamps is actually
called concurrently with a running task. The only caller I found is
GCTaskManager::print_task_time_stamps, which seems to only be called
at the end of a scavange or parallel compaction to report statistics,
presumably with all the tasks complete.
The more I look at this, the more confused I become. At this point I
think I have to retract my "looks good" and send this back to Erik for
If there isn't any concurrent access, then cmpxchg_ptr isn't needed.
If there is concurrent access, using locks would substantially
simplify the analysis and implementation, and since its a one-time
(per thread) lock, there's no real benefit to being lock-free.
I also wonder why print_task_time_stamps calls time_stamp_at, given
that it's already got the _time_stamps and could just iterate over it.
Unfortunately, I'm coming to believe there's some questionable code
here, and throwing a volatile and maybe a barrier or two at it doesn't
look like an improvement to me.
>> I would prefer to have explicit memory ordering between the linked producer and consumer pair, as Kim suggests. This is how the JMM implements sequential consistency - cooperatively with release_store_fence and load_acquire pairs on the same data.
> Hotspot is a C++ program. Mimicking the JMM in Hospot code is misguided (IMHO). In April, migrating the barriers towards the java model was suggested (http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html), but the conclusion was a small move towards C++ 11 barriers (https://bugs.openjdk.java.net/browse/JDK-8155949). It would be great if there could be general consensus about the direction of changes to the memory barriers and their use in Hotspot (cc hotspot-runtime-dev as this was discussed there recently).
I’m hoping C++11 atomics is the direction we take.
More information about the hotspot-gc-dev