RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues
rkennke at redhat.com
Wed Mar 20 20:06:25 UTC 2019
>>>> On Mar 19, 2019, at 5:51 PM, Roman Kennke <rkennke at redhat.com> wrote:
>>>> I believe G1 is also affected by this problem, as it seems to flush SATB queues in pretty much the same way as Shenandoah:
>>>> Please check it, and update as appropriate. I'm working on a fix.
>>> I did some looking at the claiming mechanism, and I agree there's an
>>> inconsistency of usage that can mess things up.
>>> We have possibly_parallel_threads_do, which only looks at Java threads
>>> and the VM thread. Other threads don't get examined, and so don't
>>> have their claim state updated.
>>> We also have callers of Threads::threads_do where the closure's
>>> do_thread conditionalizes its operation on a call to claim_oops_do.
>>> So all threads are examined and their claim state is updated.
>>> It seems like the right (or should I say wrong) sequence of operations
>>> could cause an NJT being visited by Threads::threads_do to appear to
>>> have already been claimed from the get go, and so will be skipped by
>>> all of the parallel claiming threads.
>>> Sprinkling more assert_all_threads_claimed won't currently help detect
>>> this, since it too only examines Java threads and the VM thread.
>>> It looks like an even number of uses of possibly_parallel_threads_do
>>> between a pair of Threads::threads_do with explicit claims will
>>> confuse the second Threads::threads_do.
>>> And of course, there's the problem that Shenandoah is using
>>> possibly_parallel_threads_do in a place where skipping the
>>> StringDedupThread is bad.
>>> Yikes! How is this ever working at all?
>>> I'm not sure why possibly_parallel_threads_do applies to a subset of
>>> the threads. One solution would be to change that. But there are
>>> users of it that maybe don't expect other kinds of threads and might
>>> not want to pay extra to examine and ignore them. Another would be
>>> to have two different claim fields for the two different iterators.
>>> They could be packed into the same thread member at some small cost,
>>> though looking at where the current member is located, adding a second
>>> member there should not cost any space in 64bit platforms.
>> My current idea goes roughly like this (includes some Shenandoah mess
>> that will not be there in final webrev):
>> However, this *still* doesn't solve my crashing testcase. Digging even
> I have a different approach here:
> The idea is that there's no fundamental reason for the claim token to
> be limited to a single bit counter + "uninitialized" value, as is done
> in the current code. Instead, we can expand the counter to full uint
> (or uintx even, because of alignment of surrounding members), with
> zero reserved for "uninitialized".
> On counter overflow, as with the current code, there is a risk of
> collision because of different thread sets being processed. But with
> a full width counter that's really only theoretical; it takes a long
> time to overflow, esp. with a 64bit counter, and the number of subset
> iterations since last superset iteration needs to be the max counter
> value. With the current 1bit counter we just need an even number of
> subset iterations between superset iterations to get into trouble.
> But that number increases with the counter size, and real code
> (probably) doesn't behave that way.
> And it's pretty easy to deal with even that remote possibility, by
> just resetting all the thread tokens to zero on global counter
> If you agree, reassign 8221102 to me and I'll finish this up.
This seems a workable solution too.
I am not a fan on mechanisms that depend on stuff not overflowing,
though. So, at the very least, we should protect against that.
When I see multithreaded iterators with claiming mechanisms like that
(we have a few of such all over Hotspot, e.g. CLD claiming works
similar), it makes me cringe a little. It means that all worker threads
are really looking at all items, battling over claim tokens. It would be
so much easier and more efficient to just have the array of items, and
update the iteration pointer using atomic-add. Right? In case of linked
list, it would have to be updated using CAS. I guess with list sizes of
few dozen to few 1000 (all threads) it doesn't matter much though.
Also, another simple way to fix it is to always scan all threads. That
would at least be consistent with what other iterators do (e.g.
Threads::threads_do() ). I don't think that performance is a very big
> Note that I haven't tried this against your crashing test yet. I'm
> just getting set up to try to reproduce that failure locally. Based
> on your reports, it sounds like there might be yet another issue
Yeah, something's still up, and this makes this particular change hard
to verify ;-)
More information about the hotspot-gc-dev