RFR (S): 8218446: SuspendAtExit hangs

Daniel D. Daugherty daniel.daugherty at oracle.com
Mon Mar 18 16:31:27 UTC 2019

On 3/18/19 1:42 AM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8218446
> webrev: http://cr.openjdk.java.net/~dholmes/8218446/webrev/

     Setting the thread state to _thread_blocked is only part of the
     solution. You should also mark the thread as suspend equivalent:

                // We mark this thread_blocked state as a 
suspend-equivalent so
                // that a caller to is_ext_suspend_completed() won't be 
                // The suspend-equivalent state is cleared by 

     L2285:     set_thread_state(_thread_blocked);

     By doing that you'll hit this logic in is_ext_suspend_completed():

       if (save_state == _thread_blocked && is_suspend_equivalent()) {
         // If the thread's state is _thread_blocked and this blocking
         // condition is known to be equivalent to a suspend, then we can
         // consider the thread to be externally suspended. This means that
         // the code that sets _thread_blocked has been modified to do
         // self-suspension if the blocking condition releases. We also
         // used to check for CONDVAR_WAIT here, but that is now covered by
         // the _thread_blocked with self-suspension check.
         // Return true since we wouldn't be here unless there was still an
         // external suspend request.
         *bits |= 0x00001000;
         return true;

     This will allow is_ext_suspend_completed() to return true without
     waiting for java_suspend_self() to grab the SR_lock and complete
     the self-suspension. Your change to _thread_blocked allowed you
     to avoid the retry loop in is_ext_suspend_completed() which solved
     the long delay problem. However, you can make it a little bit
     faster by marking the thread as suspend equivalent also.

More (history related) comments below...

> This test, when run standalone, unintentionally exposed a long time 
> bug in the Suspend/Resume protocol. There are lots of details in the 
> bug report, but basically if you encountered a suspend request along 
> this path:
> JavaThread::check_safepoint_and_suspend_for_native_trans
>  -> SafepointMechanism::block_if_requested_slow
>     -> Safepoint::block()  // unblocks after safepoint
>       -> JavaThread::handle_special_runtime_exit_condition
> the thread is in _thread_in_native_trans state, and the code it 
> executes explicitly leaves it in that state when calling 
> java_suspend_self():
>     // Because thread is external suspended the safepoint code will count
>     // thread as at a safepoint. This can be odd because we can be here
>     // as _thread_in_Java which would normally transition to 
> _thread_blocked
>     // at a safepoint. We would like to mark the thread as 
> _thread_blocked
>     // before calling java_suspend_self like all other callers of it but
>     // we must then observe proper safepoint protocol. (We can't leave
>     // _thread_blocked with a safepoint in progress). However we can be
>     // here as _thread_in_native_trans so we can't use a normal 
> transition
>     // constructor/destructor pair because they assert on that type of
>     // transition. We could do something like:
>     //
>     // JavaThreadState state = thread_state();
>     // set_thread_state(_thread_in_vm);
>     // {
>     //   ThreadBlockInVM tbivm(this);
>     //   java_suspend_self()
>     // }
>     // set_thread_state(_thread_in_vm_trans);
>     // if (safepoint) block;
>     // set_thread_state(state);
>     //
>     // but that is pretty messy. Instead we just go with the way the
>     // code has worked before and note that this is the only path to
>     // java_suspend_self that doesn't put the thread in _thread_blocked
>     // mode.

The above comment block has been bothering me while I have been
tracking this bug... I finally went spelunking and found that
it was added back in 2003 by this delta:

$ sp -r1.672 src/share/vm/runtime/thread.cpp

D 1.672 03/05/28 12:27:17 sgoldman 1767 1764    00037/00027/03293
4800175 -  remove StateSaver as they are obsolete. Removed deopt suspend 
Have java_suspend_self restore the original thread state the thread had 
on entry.

Based on reading that bug report and my email archive for that bug,
that bug came up during my JVM/PI stress testing in my lab in
Colorado. I suspect that we didn't see this particular timeout
issue since we didn't have the SuspendAtExit test program until
I added it in the Thread-SMR project.

> unfortunately the thread that issues the suspend() is looping inside 
> is_ext_suspend_completed() waiting for it to move out of the trans 
> state (to _thread_blocked):
>       // We wait for the thread to transition to a more usable state.
>       for (int i = 1; i <= SuspendRetryCount; i++) {
> SR_lock()->wait(!Thread::current()->is_Java_thread(), i * delay);
>         // check the actual thread state instead of what we saved above
>         if (thread_state() != _thread_in_native_trans) {
>           // the thread has transitioned to another thread state so
>           // try all the checks (except this one) one more time.
>           do_trans_retry = true;
>           break;
>         }
>      }
> After ~6.375 seconds we will exit the loop regardless and then take 
> the VM to a safepoint to "force" suspension of the target thread 
> (which was actually suspended anyway). In the test we issue 
> back-to-back suspend()/resume() up to 10000 times which means we can 
> hit this 6+ second delay frequently (test augmentation showed delays 
> of ~7000 seconds).
> The fix is quite simple: we put the thread in the _thread_blocked 
> state exactly as we already do for the suspend path in 
> JavaThread::check_special_condition_for_native_trans.

Very nice job chasing this down. Thanks!


> With that fix SuspendAt Exit no longer appears to hang.
> Additional testing: (all tests that use suspend() just for good 
> measure) (in progress)
>   - hotspot
>     - vmtestbase/nsk/jdi
>     - runtime/Thread/SuspendAtExit.java
>     - runtime/handshake/HandshakeWalkSuspendExitTest.java
>     - runtime/jni/terminatedThread/TestTerminatedThread.java
>     - vmTestbase/nsk/jvmti/GetThreadState/thrstat002/
>   - jdk
>     - com/sun/jdi/PopAsynchronousTest.java
>     - java/nio/channels/SocketChannel/SendUrgentData.java
>     - java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java
>     - java/lang/ThreadGroup/Suspend.java
> Plus mach5 tiers 1-3.
> Thanks,
> David

More information about the hotspot-runtime-dev mailing list