VM deadlock between VM shutdown and G1

Krystal Mok rednaxelafx at gmail.com
Thu Sep 13 10:14:15 UTC 2018

Hi David,

Comments inline:

On Thu, Sep 13, 2018 at 2:51 AM, David Holmes <david.holmes at oracle.com>

> Hi Kris,
> I didn't quite follow the analysis (see below)
> On 13/09/2018 6:57 PM, Krystal Mok wrote:
>> 1. A Java application thread at an allocation site triggering a G1
>> incremental collection
>> 2. A thread that called System.exit(), initiating the VM shutdown
>> sequence.
>> It's in VM's native code so it doesn't block a safepoint.
> VM code is not "native" in the sense of being safepoint-safe. If it's
> still in the System.c code trying to call the VM then it is native but as
> soon as it tries to enter the VM it will block if a safepoint is in
> progress. In addition the exit requires that the VM go to a safepoint
> before terminating.
> The time window was extremely narrow but it did happen in practice.
The Java application thread called System.exit() -> JVM_Halt(), where the
safepoint was probably not-yet active so it went past the safepoint check
upon entry and gets into _thread_in_vm state, and then when it tries to
post the VM death event it transitions to _thread_in_native state (through
JvmtiJavaThreadEventTransition, which is then safepoint safe), and probably
at around this time the safepoint synchronization started and then stopped
the world.

There's nothing really special about what the cbVMDeath otherwise. This
process has enabled the JDWP agent and that's a part of the story how this
thread got into a _thread_in_native state.

> 3. VM thread, already inside of a safepoint and started running G1's
>> incremental collection.
>> (4. "the world" is at a safepoint so all other Java threads are just
>> waiting)
>> The problem is, Thread 2 has already run half way into before_exit(), and
> The problem seems to be an event callback, cbVMDeath, which seems to have
> take the thread from _thread_in_vm (which is not a safepoint-safe state) to
> presumably _thread_in_native, which is safepoint-safe. The callback then
> blocks on a RawMonitorWait for something and that would seem to be where
> the problem arises. What is the callback trying to do?
The cbVMDeath callback is just waiting for other active callbacks to check
in (line 1273):


1267     debugMonitorEnter(callbackBlock); {
1268         debugMonitorEnter(callbackLock); {
1269             vm_death_callback_active = JNI_TRUE;
1270             (void)threadControl_resumeAll();
1271             while (active_callbacks > 0) {
1272                 /* wait for active CALLBACKs to check in (and block) */
1273                 debugMonitorWait(callbackLock);
1274             }
1275         } debugMonitorExit(callbackLock);

The real deadlock in this case should still be the fact that VM has reached
a safepoint at this point in time, but G1's waiting for the concurrent
marker to check in, yet it's already gone.

> Cheers,
> David


More information about the hotspot-dev mailing list