RFR(s): 8204166: TLH: Semaphore may not be destroy until signal have returned.

Robbin Ehn robbin.ehn at oracle.com
Thu Jun 14 10:53:28 UTC 2018

Hi Erik,

I should have given you credit in RFR, thanks for all help during this bug hunt!
Also thanks to Stefan K!

On 2018-06-14 12:52, Erik Österlund wrote:
> Hi Robbin,
> Looks good.



> Thanks,
> /Erik
> On 2018-06-14 12:11, Robbin Ehn wrote:
>> Hi all, please review.
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8204166
>> Webrev: http://cr.openjdk.java.net/~rehn/8204166/v1/webrev/
>> The root cause of this failure is a bug in the posix semaphores: 
>> https://sourceware.org/bugzilla/show_bug.cgi?id=12674
>> Thread a:
>> sem_post(my_sem);
>> Thread b:
>> sem_wait(my_sem);
>> sem_destroy(my_sem);
>> Thread b is waiting on my_sem (count 0), Thread a posts (count 0->1).
>> If Thread b start executing directly after the increment in post but before
>> Thread a leaves the call to post and manage to destroy the semaphore. Thread a
>> _can_ get EINVAL from sem_post! This is fixed in newer glibc(2.21).
>> Note that mutexes have had same issue on some platforms:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=13690
>> Fixed in 2.23.
>> Since we only have one handshake operation running at anytime (safepoints and 
>> handshakes are also mutual exclusive, both run on VM Thread) we can actually 
>> always use the same semaphore. This patch changes the _done semaphore to be 
>> static instead, thus avoiding the post<->destroy race.
>> Patch also contains some small changes which remove of dead code, remove 
>> unneeded state, handling of cases which we can't easily say will never happen 
>> and some additional error checks.
>> Handshakes test passes, but they don't trigger the original issue, so more 
>> interesting is that this issue do not happen when running ZGC which utilize 
>> handshakes with the static semaphore.
>> Thanks, Robbin

More information about the hotspot-dev mailing list