RFR(s): 8220774: Add HandshakeALot diag option

Robbin Ehn robbin.ehn at oracle.com
Thu Mar 21 13:57:59 UTC 2019

Hi Dan,

On 2019-03-20 21:21, Daniel D. Daugherty wrote:
>> And here is v2 for you to consider:
>> http://rehn-ws.se.oracle.com/cr_mirror/8220774/v2/webrev/ 
> It's too difficult to craft my comments relative to the incremental
> webrev so I'm working with the full webrev still.
> src/hotspot/share/runtime/globals.hpp
>      No comments.
> src/hotspot/share/runtime/thread.hpp
>      No comments.
> src/hotspot/share/runtime/thread.cpp
>      L2953:   // This checks that the thread have a correct frame state during a 
> handshake
>          typo: s/have/has/
>          Also please add a '.' to end of the sentence.


>      L2954:   assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>      L2955:          (has_last_Java_frame() && java_call_counter() > 0), "wrong 
> java_sp info!");
>          Trying to make sure I understand what you are asserting:
>          1) if the thread does not have a last Java frame then it must have
>             never called Java
>          2) if the thread has a last Java frame, then it must have called
>             Java at least once
>          I'm good with the second expression. I have some vague doubts about
>          first expression. When a JavaThread is done executing Java code and
>          it is on its way toward a thread exit, is there ever a time when it
>          no longer has a last Java frame? I'm thinking a handshake late in
>          the JavaThread's life...

The assert is a copy paste from
L2959 void JavaThread::nmethods_do(CodeBlobClosure* cf) {

I have seen this assert trigger twice on windows, either it's wrong and should
be removed from nmethods_do, or it is correct and this will help us find when
this happens.

> src/hotspot/share/runtime/vmThread.hpp
>      No comments.
> src/hotspot/share/runtime/vmThread.cpp
>      L503:           MutexUnlockerEx mul(VMOperationQueue_lock, 
> Mutex::_no_safepoint_check_flag);
>          Please add something like this above L503:
>                      // Have to unlock VMOperationQueue_lock just in case
>                      // no_op_safepoint() has to do a handshake.
>      old L619:     // We want to make sure that we get to a safepoint regularly.
>      old L620:     //
>      old L621:     if ((_cur_vm_operation = VMThread::no_op_safepoint(false)) != 
> NULL) {
>          This call to no_op_safepoint() is at the bottom of the "while(true)"
>          loop in VMThread::loop(). Before the fix for 8219436, this line used to 
> be:
>              if (VMThread::no_op_safepoint_needed(true)) {
>          and it was the only call to no_op_safepoint_needed() that passed
>          'true'. It's the 'true' parameter that made the comment on L619
>          correct. Why do I say that? Well, the only way we get to the
>          bottom of the "while(true)" loop is if we had a vm_operation to
>          perform. Let's say we had a bunch of no-safepoint vm_operations
>          to perform and we just kept executing them, one after another.
>          While doing this work, if our time between safepoints exceeds
>          GuaranteedSafepointInterval, then this no_op_safepoint_needed(true)
>          call is what would detect that we've gone too long between safepoints
>          and would force one.
>          With the fix for 8219436, that call was changed to no_op_safepoint(false)
>          and with the v2 version of this fix that call is now gone. So we no
>          longer have the ability to have GuaranteedSafepointInterval work
>          right when we are doing lots of non-safepoint VM operations.
>          This no_op_safepoint_needed() probe point, either with 'true' or
>          'false', also gave us the opportunity to force a safepoint when
>          SafepointALot is true or when SafepointSynchronize::is_cleanup_needed()
>          returns true. In the v2 version of this fix, we lose the ability
>          to SafepointALot after each VM operation. We also lose the ability
>          to do cleanup() after each VM operation.

And I don't think that is an issue, ICache and Monitor, which is the cleanup we
do, just have just basic heuristic. If we cleanup every second or every other
second makes little difference.

SafepointALot is just diag, which causes safepoint directly after a safepoint.
I'd rather have it doing one extra safepoint per GuaranteedSafepointInterval,
that way you know how many safepoint you have per second, which is really useful
for benchmarking. In current form it can't be used for benchmarks.

>          I think you need to restore the "bool check_time" parameter to
>          no_op_safepoint() and you need to restore this call to as:
>          VMThread::no_op_safepoint_needed(true) along with the rest
>          of old L618-L626.

I don't need the parameter, we can always check the time.
When we timeout on queue wait we will always pass the time check.

So the difference is that I think the previous behavior was bad/buggy and you 
are trying to restore it :)

Check-out v3, just full:

- wait on queue GuaranteedSafepointInterval
- if timeout
	- do handshake a lot
	- we are guaranteed to pass the time check, so cleanup if needed
	- else safepoint alot
- if op
	- after op
		- do handshake a lot
		- check time, if expired, cleanup if needed
		- else safepoint alot

I think that is what you want :)
And that's what in v3.

I would rather have:
- wait-time = GuaranteedSafepointInterval - 
- if wait-time > 0
	- wait on queue
- else or timeout
	- do handshake a lot
	- cleanup if needed
	- else safepoint alot

That way we do check for cleanup after non-safepoint in time but we do not
causes extra safepoints after safepoints with SafepointALot. (I know you think 
of this as feature :) )

Thanks, Robbin

More information about the hotspot-runtime-dev mailing list