JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

John Cuthbertson john.cuthbertson at oracle.com
Wed Mar 6 10:04:16 PST 2013

Hi Everyone,

I've looked at the bug report (haven't tried to reproduce it yet) and 
Bengt's analysis is correct. The concurrent mark thread is entering the 
synchronization protocol in a marking step call. That code is waiting 
for some non-existent workers to terminate before proceeding. Normally 
we shouldn't be entering that code but I think we overflowed the global 
marking stack (I updated the CR at ~1am my time with that conjecture). I 
think I missed a set_phase() call to tell the parallel terminator that 
we only have one thread and it's picking up the number of workers that 
executed the remark parallel task.

Thomas: you were on the right track with your comment about the marking 
stack size.

Thanks for helping out here. The stack trace you mentioned was for one 
the refinement threads - a concurrent GC thread. When a concurrent GC 
thread "joins" the suspendible thread set, it means that it will observe 
and participate in safepoint operations, i.e. the thread will notice 
that it should reach a safepoint and the safepoint synchronizer code 
will wait for it to block. When we wish a concurrent GC thread to not 
observe safepoints, that thread leaves the suspendible thread set. I 
think the name could be a bit better and Tony, before he left, had a 
change that used a scoped object to join and leave the STS that hasn't 
been integrated yet. IIRC Tony wasn't happy with the name he chose for 
that also.

Thanks for bringing this up and my apologies for not replying sooner. I 
will have a fix fairly soon. If I'm correct about it being caused by 
overflowing the marking stack you can work around the issue by 
increasing the MarkStackSize.you could try increasing it to 2M or 4M 
entries (which is the current max size).



On 3/6/2013 5:43 AM, Thomas Schatzl wrote:
> Hi,
> On Wed, 2013-03-06 at 13:49 +0100, Uwe Schindler wrote:
>> Hi Bengt,
>> That was fast! We are happy that you were able to analyze the bug and will fix it soon. To not make our Jenkins server get stuck in the tests, I will disable G1GC until a new update is installed. We will then only test the other garbage collectors with Lucene.
>> Do you have an idea, why this bug is not appearing on 64 bit? It might be caused by other GC behavior as the word size is different (the Lucene tests use -Xmx512M, so its fixed in 32 and 64 bit at the moment). I just want to understand this! I can run the test suite with 64 bit JDK over and over, it never hangs. But when running with 32 bit it hangs in all cases.
>    one possible reason is that the default mark stack size much is larger
> on 64 bit, so no mark stack overflow occurs.
> E.g. in globals.hpp:
>    product(uintx, MarkStackSizeMax, NOT_LP64(4*M) LP64_ONLY(512*M),
> \
> You may want to try to set MarkStackSizeMax to 4M on 64 bit too to test
> this.
> This is just a hunch though.
> Thomas

More information about the hotspot-gc-dev mailing list