JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
uschindler at apache.org
Wed Mar 6 10:50:35 PST 2013
Thanks for the response and the analysis, very informative!
> Hi Everyone,
> I've looked at the bug report (haven't tried to reproduce it yet) and Bengt's
> analysis is correct. The concurrent mark thread is entering the
> synchronization protocol in a marking step call. That code is waiting for some
> non-existent workers to terminate before proceeding. Normally we
> shouldn't be entering that code but I think we overflowed the global marking
> stack (I updated the CR at ~1am my time with that conjecture). I think I
> missed a set_phase() call to tell the parallel terminator that we only have one
> thread and it's picking up the number of workers that executed the remark
> parallel task.
> Thomas: you were on the right track with your comment about the marking
> stack size.
> Thanks for helping out here. The stack trace you mentioned was for one the
> refinement threads - a concurrent GC thread. When a concurrent GC thread
> "joins" the suspendible thread set, it means that it will observe and
> participate in safepoint operations, i.e. the thread will notice that it should
> reach a safepoint and the safepoint synchronizer code will wait for it to block.
> When we wish a concurrent GC thread to not observe safepoints, that
> thread leaves the suspendible thread set. I think the name could be a bit
> better and Tony, before he left, had a change that used a scoped object to
> join and leave the STS that hasn't been integrated yet. IIRC Tony wasn't
> happy with the name he chose for that also.
> Thanks for bringing this up and my apologies for not replying sooner. I will
> have a fix fairly soon. If I'm correct about it being caused by overflowing the
> marking stack you can work around the issue by increasing the
> MarkStackSize.you could try increasing it to 2M or 4M entries (which is the
> current max size).
Is there a setting on the command line to raise this size? This would be great to check out if one can also do the opposite (lower the size on 64 bit JVM to make the 64 bit one also hang). Unfortunately as a Java programmer I am not so familiar with building the JVM on Ubuntu machines (including the needed IcedTea), so it's hard to me to try this out - I would not even know how to start doing this or finally how to get something like a standard JDK directory so you could use it as JAVA_HOME.
If you need a verification that your patch is working, it would be good to get a i586 Linux tgz file with a binary, so I can do a quick check on the Jenkins server that found the bug. Otherwise we would need to wait until a new build appears on jdk8.java.net (including the fix + other fixes in javadoc/javac tool and the class library that we reported earlier).
I could also assist in setting up a Lucene build directory (as reported on the first email), to reproduce the problem with the Lucene source code (which is very easy). As said before, I have no isolated test case :(
Thanks in any case,
> On 3/6/2013 5:43 AM, Thomas Schatzl wrote:
> > Hi,
> > On Wed, 2013-03-06 at 13:49 +0100, Uwe Schindler wrote:
> >> Hi Bengt,
> >> That was fast! We are happy that you were able to analyze the bug and
> will fix it soon. To not make our Jenkins server get stuck in the tests, I will
> disable G1GC until a new update is installed. We will then only test the other
> garbage collectors with Lucene.
> >> Do you have an idea, why this bug is not appearing on 64 bit? It might be
> caused by other GC behavior as the word size is different (the Lucene tests
> use -Xmx512M, so its fixed in 32 and 64 bit at the moment). I just want to
> understand this! I can run the test suite with 64 bit JDK over and over, it
> never hangs. But when running with 32 bit it hangs in all cases.
> > one possible reason is that the default mark stack size much is
> > larger on 64 bit, so no mark stack overflow occurs.
> > E.g. in globals.hpp:
> > product(uintx, MarkStackSizeMax, NOT_LP64(4*M) LP64_ONLY(512*M),
> > You may want to try to set MarkStackSizeMax to 4M on 64 bit too to
> > test this.
> > This is just a hunch though.
> > Thomas
More information about the hotspot-gc-dev