RFR: 8152196: SuspendibleThreadSet::yield scales poorly

Kim Barrett kim.barrett at oracle.com
Mon Mar 21 13:26:16 UTC 2016

> On Mar 21, 2016, at 5:41 AM, Mikael Gerdin <mikael.gerdin at oracle.com> wrote:
> Hi Kim,

Thanks for reviewing.  Replies to your comments inline.

> On 2016-03-18 20:58, Kim Barrett wrote:
>> Please review this fix for a performance scaling problem in
>> SuspendibleThreadSet, leading to unnecessary additional safepoint
>> latency as the number of suspendible threads increases. See the CR for
>> details of the problem and some performance data.
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8152196
>> Webrev:
>> http://cr.openjdk.java.net/~kbarrett/8152196/webrev.00/
> I agree with your assessment in the bug that the limited set of available synchronization primitives is unfortunate.
>  36 static Semaphore* _synchronize_wakeup = NULL;
> Why not make it a static member in SuspendibleThreadSet?

That would require making the init function a friend, or making it a
static member function and #include the STS header in init.cpp,
neither of which seems like an improvement to me.  I also have a
preference for using file scope over cluttering a public class
definition with implementation details, when that's possible.

> Besides that, I think your change is good.
> Since you've already performance tested your fix, I think it should be pushed but here are some thoughts around another approach.
> The primary reason for using semaphores for activating gc worker threads was that Monitor::notify_all is inefficient for the purpose of simply releasing threads without them requiring any mutual exclusion access upon their activation.
> Following that pattern the STS could have the joining or yielding threads wait on a semaphore and use the Monitor to signal to the VM thread requesting the synchronization. This would cause a situation where the last STS thread would have to first notify the VM thread and then wait on the semaphore but as long as the VM thread knows the exact number to increment the semaphore by then that shouldn't be a problem.
> I haven't really thought this approach through so there may be flaws in it which makes it unfeasible.

I'd thought about that, but it seemed to get messy, especially because
of the need for additional accounting of threads blocked in join().
There *is* mutual exclusion needed upon activation right now, for the
woken up threads to update the accounting of how many threads are in
which state.  Maybe that could be changed, but use of a monitor for
wakeup here seems like the usual and natural solution, so I'm inclined
to stick with it unless there's actual evidence of a problem with
doing so.

I'm also doubtful that N semaphore signals will be faster than one
monitor notify_all in the VM thread, which I think is the more
critical path in this case.

More information about the hotspot-gc-dev mailing list