RFR: JDK-8221086: Shenandoah-specific workaround for JDK-8220671

Roman Kennke rkennke at redhat.com
Tue Mar 19 19:18:02 UTC 2019


>> JDK-8220671: Initialization race for non-JavaThread PtrQueue is making
>> troubles in Shenandoah's testing. While we're working out a generally
>> acceptable solution, we need a usable workaround in Shenandoah.
>>
>> Bug:
>> https://bugs.openjdk.java.net/browse/JDK-8221086
>> Webrev:
>> http://cr.openjdk.java.net/~rkennke/JDK-8221086/webrev.01/
>>
>> The proposed fix places a SuspendibleThreadJoiner around the
>> on_thread_attach() and on_thread_detach(), thereby preventing the race.
>>
>> As soon as we figured out a solution for JDK-8220671 (which may well be
>> this 'workaround') we shall revert/overwrite this change.
>>
>> Testing: many rounds of hotspot_gc_shenandoah and even more rounds of
>> the offending TestStringDedupStress.
>>
>> Ok?
>>
>> Roman
> 
> I was expecting there to be some shared code changes here, and not a pure shenandoah
> change.

That's why I called it 'workaround'. It's not an all-inclusive solution. 
This allows others to make progress with their own patches. It's not 
helpful to have tests failing randomly for several days. Otherwise I'd 
have to ask you to back out the original JDK-8219613 ;-) This seems 
overkill. Also, having this run through our CI would give us some more 
confidence that the change is good (or not).

>  Doesn’t this have the same problem as your “nope, needs coffee” proposal in the
> main email thread from yesterday?  The attach/list-add pair needs to be atomic wrto the
> SATB state change, and this doesn’t cover the list-add.

It does prevent the critical sections:
1. Global SATB flag change
2. New-thread SATB flag cache pickup

to prevent from racing with each other, because it causes attaching 
threads to line up at safepoints if anything like that happens. The 
actual list addition doesn't seem to be problematic. Or is it? If you 
think it is, pls explain why? Also, probably take it back to the main 
thread then ;-)

In any case, the testcase that used to fail once in ~5 runs has survived 
many dozens of runs now. Even if the fix is not complete and only covers 
up a deeper problem, it helps us to make progress with other stuff. Ok?

Roman


More information about the hotspot-gc-dev mailing list