RFR(s): 8220774: Add HandshakeALot diag option

Robbin Ehn robbin.ehn at oracle.com
Wed Mar 20 15:55:34 UTC 2019

Hi Dan, thanks for having a look!

> src/hotspot/share/runtime/thread.hpp
>      L1871:   // Debug method asserting thread states
>      L1872:   DEBUG_ONLY(void assert_states();)
>          The comment doesn't add anything beyond what you can see with
>          the line of code (except for the "thread" part). I think you
>          need a better function name and a better comment.

My thinking was that anyone can add whatever should be true for a JavaThread 
during a handshake.

> src/hotspot/share/runtime/thread.cpp
>      L2952: void JavaThread::assert_states() {
>      L2953:   assert((!has_last_Java_frame() && java_call_counter() == 0) ||
>      L2954:          (has_last_Java_frame() && java_call_counter() > 0), "wrong 
> java_sp info!");
>          The assert failure string doesn't really match the assertion
>          unless I missing something here...

The last java frame is populated from the stack pointer.
The assert is copy n paste from JavaThread::nmethods_do.

>          Also what's the reason for this assertion?

I saw a issue on windows with this. I have not been able to reproduce it.

> src/hotspot/share/runtime/vmThread.hpp
>      No comments.
> src/hotspot/share/runtime/vmThread.cpp
>      L437: class HandshakeALotTC : public ThreadClosure {
>      <snip>
>      L439:   virtual void do_thread(Thread* thread) {
>      <snip>
>      L441:     JavaThread* jt = (JavaThread*)thread;
>      L442:     DEBUG_ONLY(jt->assert_states();)
>          Okay, but why are these thread states important here?

As I said above, we should add more here as we continue.

>          Also L441 could also be DEBUG_ONLY. In fact the whole function
>          body could be just #ifdef ASSERT ... #endif.


>      old L437: VM_Operation* VMThread::no_op_safepoint(bool check_time) {
>      <snip>
>      old L444:   if (check_time) {
>      old L445:     long interval_ms = 
> SafepointTracing::time_since_last_safepoint_ms();
>      old L446:     bool max_time_exceeded = GuaranteedSafepointInterval != 0 &&
>      old L447:                              (interval_ms > 
> GuaranteedSafepointInterval);
>      old L448:     if (!max_time_exceeded) {
>      old L449:       return NULL;
>      old L450:     }
>      old L451:   }
>          In the existing code base, all callers to no_op_safepoint()
>          passed 'false' so L444-L451 was dead code. It looks like
>          before this changeset, we had both 'true' and 'false' callers:
>          changeset:   53895:b22d8ae270a2
>          user:        rehn
>          date:        Fri Feb 22 14:20:06 2019 +0100
>          summary:     8219436: Safepoint logs correction and misc
>      new L446: VM_Operation* VMThread::no_op_safepoint() {
>      new L447:   long interval_ms = 
> SafepointTracing::time_since_last_safepoint_ms();
>      new L448:   bool max_time_exceeded = GuaranteedSafepointInterval != 0 &&
>      new L449:                            (interval_ms >= 
> GuaranteedSafepointInterval);
>      new L450:   if (!max_time_exceeded) {
>      new L451:     return NULL;
>      new L452:   }
>          In the new code, L447-L452 are no longer dead and in fact the
>          block is now first and is always checked. This means that if
>          max_time_exceeded == false, we return NULL and there's no
>          safepoint.

In the first call the timeout from the queue will always be greater, so there 
was no point in avoiding that check. (simpler code I thought)

>          This is a change in behavior for the SafepointALot flag since
>          this code block used to be first:
>          old L438:   if (SafepointALot) {
>          old L439:     return &safepointALot_op;
>          old L440:   }
>          So unless I'm missing something SafepointALot now appears to
>          be broken since it will only happen if max_time_exceeded == true
>          and that's just a normal safepoint interval.

Not really because of the:
"We want to make sure that we get to a safepoint regularly."
Is only executed after a real VM operation.

And I needed to move it so you can run with both safepoint a lot and handshake a 
lot and get both.

>      old L494:         if (timedout && (_cur_vm_operation = 
> VMThread::no_op_safepoint(false)) != NULL) {
>      old L495:           MutexUnlockerEx mul(VMOperationQueue_lock,
>      old L496: Mutex::_no_safepoint_check_flag);
>      new L506:           MutexUnlockerEx mul(VMOperationQueue_lock, 
> Mutex::_no_safepoint_check_flag);
>      new L507:           if (timedout && (_cur_vm_operation = 
> VMThread::no_op_safepoint()) != NULL) {
>          The old code only unlocked (and relocked) VMOperationQueue_lock
>          when we knew that we needed to force a safepoint. Now it is always
>          unlocked (and relocked) on this code path.

Yes since if we need a handshake it must be unlocked.
VMOperationQueue_lock is not a contended lock, unlocking and relocking it every 
GuaranteedSafepointInterval is not an issue.

>      old L621:     if ((_cur_vm_operation = VMThread::no_op_safepoint(false)) != 
> NULL) {
>      new L633:     if ((_cur_vm_operation = VMThread::no_op_safepoint()) != NULL) {
>          Before the fix for 8219436, this line used to be:
>              if (VMThread::no_op_safepoint_needed(true)) {
>          so the fix for 8219436 is what made old L444-L451 into
>          dead code.
> Okay, so I think that this fix breaks the SafepointALot flag.
> The fix for 8219436 broke the max_time_exceeded logic and
> this fix solves that, but it breaks SafepointALot at the same
> time so I think you need to rework things a bit.

GuaranteedSafepointInterval only calls VMThread::no_op_safepoint, if there is no 
cleanup needed we don't safepoint because it hurts performance.

If we have set SafepointALot we do safepoint on GuaranteedSafepointInterval.
(ignoring that we sometime end up in dummy code for safepointing 'regulary', 
I'll just remove that in v2)

java -XX:+UnlockDiagnosticVMOptions -XX:GuaranteedSafepointInterval=100 
-XX:+SafepointALot -cp . Dummy

[1.157s][info][safepoint] Safepoint "None", Time since last: 100127512 ns, 
Reaching safepoint: 147044 ns, At safepoint: 32355 ns, Total: 179399 ns
[1.258s][info][safepoint] Safepoint "None", Time since last: 100141946 ns, 
Reaching safepoint: 190741 ns, At safepoint: 65858 ns, Total: 256599 ns
[1.358s][info][safepoint] Safepoint "None", Time since last: 100368088 ns, 
Reaching safepoint: 184854 ns, At safepoint: 51552 ns, Total: 236406 ns
[1.459s][info][safepoint] Safepoint "None", Time since last: 100186239 ns, 
Reaching safepoint: 196150 ns, At safepoint: 40827 ns, Total: 236977 ns

With my v1 patch:
[2.567s][info][safepoint] Safepoint "None", Time since last: 100112625 ns, 
Reaching safepoint: 6360001 ns, At safepoint: 104928 ns, Total: 6464929 ns
[2.668s][info][safepoint] Safepoint "None", Time since last: 100117756 ns, 
Reaching safepoint: 1671472 ns, At safepoint: 89107 ns, Total: 1760579 ns
[2.770s][info][safepoint] Safepoint "None", Time since last: 100111951 ns, 
Reaching safepoint: 1849522 ns, At safepoint: 96310 ns, Total: 1945832 ns
[2.873s][info][safepoint] Safepoint "None", Time since last: 100107481 ns, 
Reaching safepoint: 1834603 ns, At safepoint: 86498 ns, Total: 1921101 ns

Since we do a timed wait on queue with GuaranteedSafepointInterval this is what 
we get.

The "safepoint regularly" case just causes an extra safepoint directly after a 
safepoint with an real operation:
[0.313s][info][safepoint] Safepoint "RevokeBias", Time since last: 64643 ns, 
Reaching safepoint: 697102 ns, At safepoint: 76344 ns, Total: 773446 ns
[0.313s][info][safepoint] Safepoint "None", Time since last: 38694 ns, Reaching 
safepoint: 143387 ns, At safepoint: 29936 ns, Total: 173323 ns

Which is the only case this changes behavior for SafepointALot.
Which I'll remove for v2, if you really like it, it should be in debug builds 
only. Since the whole "safepoint regularly" is very false today, we don't want 
to safepoint :)

And here is v2 for you to consider:

Sanity with t1-2.

Thanks, Robbin

> Dan
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8220774
>> We need an option to generate a lot of handshakes for testing.
>> This adds -XX:+HandshakeALot, typical usage:
>> -XX:+UnlockDiagnosticVMOptions -XX:GuaranteedSafepointInterval=200 
>> -XX:+HandshakeALot
>> Will generate a extra handshake if there is no safepoint/handshake per 200ms.
>> Thanks, Robbin

More information about the hotspot-runtime-dev mailing list