RFR: JDK-8220671: Initialization race for non-JavaThread PtrQueues

Roman Kennke rkennke at redhat.com
Wed Mar 20 19:11:08 UTC 2019


>>>> The test started failing somewhere between jdk-13+9 and jdk-13+11, and I
>>>> bisected it down to NJT PtrQueues change. It also seemed like the most
>>>> likely candidate in that frame. It only ever seems to crash with
>>>> +UseStringDuplication, and since the strdedup thread does SATB, it seems
>>>> plausible that the change affects this.
> 
> I agree that change is the likely culprit, in one way or another.

I just tried again, with or with that change and it's definitely stable 
before.

>>>> Any help would be greatly appreciated.
>>> I have added asserts that verify that, after final flushing of thread-local SATB queues, that *all* thread's SATB queues are empty. It does not trigger, any yet, I see crashes.
>>> This tells me that it is failing to enqueue some oops to begin with. Our ShBS::enqueue() not only checks the thread-local SATB-active flag, but also the global one. Do you think there might be a race accessing this? I.e. NJT possibly seeing a stale value because it does not synchronize on the same stuff as Java threads do when safepointing?
>>
>> E.g., PtrQueueSet::_all_active is not volatile and is not accessed using any OrderAccess either... ?
>>
>> Roman
> 
> PQS::_all_active isn’t volatile because there aren’t supposed to be concurrent readers when it’s written.
> The initialization race for NJTs is a counter-example that is a bug, which we’re discussing a fix for here.
> 
> Why does ShBS::enqueue look at the global SATB-active flag?  That seems like a mistake.  Though
> I wouldn’t expect there to be any threads calling enqueue() while the global SATB state is being changed.
> Doing so also seems like a mistake.

Dunno. It certainly doesn't have to. However, that is not the source of 
the issue. I'm still digging.

I see that at some point there is a reference in StringDedupQueue that 
points to reclaimed memory. That indicates that this object has not been 
marked, likely because something go wrong in the oop_load/keepalive call 
from StrDedupThread. However what baffles me: logs indicate that the 
reference has not gone through ShBS::enqueue(), not even attempted and 
then rejected (b/c not-active or such). That's all I found out today.. :-(

Roman


More information about the hotspot-gc-dev mailing list