RFR(s): 8204166: TLH: Semaphore may not be destroy until signal have returned.
robbin.ehn at oracle.com
Mon Jun 18 13:58:41 UTC 2018
On 06/18/2018 03:12 PM, Erik Österlund wrote:
> Hi Robbin,
> Looks good.
> On 2018-06-18 15:07, Robbin Ehn wrote:
>> Hi all,
>> After some internal discussions I changed the patch to:
>> Which handles thread off javathreads list better.
>> Passes handshake testing and ZGC testing seems okay.
>> Thanks, Robbin
>> On 06/14/2018 12:11 PM, Robbin Ehn wrote:
>>> Hi all, please review.
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8204166
>>> Webrev: http://cr.openjdk.java.net/~rehn/8204166/v1/webrev/
>>> The root cause of this failure is a bug in the posix semaphores:
>>> Thread a:
>>> Thread b:
>>> Thread b is waiting on my_sem (count 0), Thread a posts (count 0->1).
>>> If Thread b start executing directly after the increment in post but before
>>> Thread a leaves the call to post and manage to destroy the semaphore. Thread a
>>> _can_ get EINVAL from sem_post! This is fixed in newer glibc(2.21).
>>> Note that mutexes have had same issue on some platforms:
>>> Fixed in 2.23.
>>> Since we only have one handshake operation running at anytime (safepoints and
>>> handshakes are also mutual exclusive, both run on VM Thread) we can actually
>>> always use the same semaphore. This patch changes the _done semaphore to be
>>> static instead, thus avoiding the post<->destroy race.
>>> Patch also contains some small changes which remove of dead code, remove
>>> unneeded state, handling of cases which we can't easily say will never happen
>>> and some additional error checks.
>>> Handshakes test passes, but they don't trigger the original issue, so more
>>> interesting is that this issue do not happen when running ZGC which utilize
>>> handshakes with the static semaphore.
>>> Thanks, Robbin
More information about the hotspot-dev