RFR: 8152196: SuspendibleThreadSet::yield scales poorly
mikael.gerdin at oracle.com
Mon Mar 21 15:41:20 UTC 2016
On 2016-03-21 14:26, Kim Barrett wrote:
>> On Mar 21, 2016, at 5:41 AM, Mikael Gerdin <mikael.gerdin at oracle.com> wrote:
>> Hi Kim,
> Thanks for reviewing. Replies to your comments inline.
>> On 2016-03-18 20:58, Kim Barrett wrote:
>>> Please review this fix for a performance scaling problem in
>>> SuspendibleThreadSet, leading to unnecessary additional safepoint
>>> latency as the number of suspendible threads increases. See the CR for
>>> details of the problem and some performance data.
>> I agree with your assessment in the bug that the limited set of available synchronization primitives is unfortunate.
>> 36 static Semaphore* _synchronize_wakeup = NULL;
>> Why not make it a static member in SuspendibleThreadSet?
> That would require making the init function a friend, or making it a
> static member function and #include the STS header in init.cpp,
> neither of which seems like an improvement to me. I also have a
> preference for using file scope over cluttering a public class
> definition with implementation details, when that's possible.
I see. Ok.
>> Besides that, I think your change is good.
>> Since you've already performance tested your fix, I think it should be pushed but here are some thoughts around another approach.
>> The primary reason for using semaphores for activating gc worker threads was that Monitor::notify_all is inefficient for the purpose of simply releasing threads without them requiring any mutual exclusion access upon their activation.
>> Following that pattern the STS could have the joining or yielding threads wait on a semaphore and use the Monitor to signal to the VM thread requesting the synchronization. This would cause a situation where the last STS thread would have to first notify the VM thread and then wait on the semaphore but as long as the VM thread knows the exact number to increment the semaphore by then that shouldn't be a problem.
>> I haven't really thought this approach through so there may be flaws in it which makes it unfeasible.
> I'd thought about that, but it seemed to get messy, especially because
> of the need for additional accounting of threads blocked in join().
> There *is* mutual exclusion needed upon activation right now, for the
> woken up threads to update the accounting of how many threads are in
> which state. Maybe that could be changed, but use of a monitor for
> wakeup here seems like the usual and natural solution, so I'm inclined
> to stick with it unless there's actual evidence of a problem with
> doing so.
Yep, I agree it makes sense. I was just thinking out loud a bit.
> I'm also doubtful that N semaphore signals will be faster than one
> monitor notify_all in the VM thread, which I think is the more
> critical path in this case.
I'm not a big fan of Monitor::notify_all but in this case that may be
true. Signaling a semaphore will likely awaken the waiting threads which
will contend with the VM thread for CPU time in the critical wakeup
phase of the safepoint.
More information about the hotspot-gc-dev