JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Bengt Rutisson bengt.rutisson at oracle.com
Wed Mar 6 00:04:24 PST 2013


I think this is a VM bug and the thread dumps that Uwe produced are 
enough to start tracking down the root cause.

On 3/6/13 8:52 AM, David Holmes wrote:
> If the VM is completely unresponsive then it suggests we are at a 
> safepoint.
Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.

> The GC threads are not "hung" in os::parK, they are parked - waiting 
> to be notified of something.

It looks like the reference processing thread is stuck in a loop where 
it does wait(). So, the VM is hanging even if that stack trace also ends 
up in os::park().

> The thing is to find out why they are not being woken up.

Actually, in this case we should probably not even be calling wait...

> Can the gdb log be posted somewhere? I don't know if the attachment 
> made it to the original posting on hotspot-gc but it's no longer 
> available on hotspot-dev.

I received the attachment with the original email. I've attached it to 
the bug report that I created: 8009536. You can find it there if you 
want to. But I think we have a fairly good idea of what change caused 
the hang.


> Thanks,
> David
> On 6/03/2013 4:07 PM, Krystal Mok wrote:
>> Hi Uwe,
>> If you can attach gdb onto it, and jstack -m and jstack -F should also
>> work; that'll get you the Java stack trace.
>> (But it probably doesn't matter in this case, because the hang is
>> probably bug in the VM).
>> - Kris
>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler <uschindler at apache.org> 
>> wrote:
>>> Hi,
>>> since a few month we are extensively testing various preview builds 
>>> of JDK 8 for compatibility with Apache Lucene and Solr, so we can 
>>> find any bugs early and prevent the problems we had with the release 
>>> of Java 7 two years ago. Currently we have a Linux (Ubuntu 64bit) 
>>> Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8 snapshot, 
>>> IBM J9, older JRockit) installed, choosing a different one with 
>>> different hotspot and garbage collector settings on every run of the 
>>> test suite (which takes approx. 30-45 minutes).
>>> JDK 8 b79 works so far very well on Linux, we found some strange 
>>> behavior in early versions (maybe compiler errors), but no longer at 
>>> the moment. There is one configuration that constantly and 
>>> reproducibly hangs in one module that is tested: The configuration 
>>> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client 
>>> does not matter). The JVM running the tests hangs irresponsible 
>>> (jstack or kill -3 have no effect/cannot connect, standard kill does 
>>> not stop it, only kill -9 actually kills it). It can be reproduced 
>>> in this Lucene module 100% (it hangs always).
>>> I was able to connect with GDB to the JVM and get a stack trace on 
>>> all threads (see attachment, dump.txt). As you see all threads of 
>>> G1GC seem to hang in a syscall (os:park(), a conditional wait in 
>>> pthread library). Unfortunately that’s all I can give you. A Java 
>>> stacktrace is not possible because the JVM reacts on neither kill -3 
>>> nor jstack. With all other garbage collectors it passes the test 
>>> without hangs in a few seconds, with 32 bit G1GC it can stand still 
>>> for hours. The 64 bit JVM passes with G1GC, so only the 32 bit 
>>> variant is affected. Client or Server VM makes no difference.
>>> To reproduce:
>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this 
>>> should not matter)
>>> - Download Lucene Source code (e.g. the snapshot version we were 
>>> testing with: 
>>> https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/)
>>> - change to directory lucene/analysis/uima and run:
>>>          ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 
>>> -Dtests.jvms=1 test
>>> After a while the test framework prints "stalled" messages (because 
>>> the child VM actually running the test no longer responds). The PID 
>>> is also printed. Try to get a stack trace or kill it, no response. 
>>> Only kill -9 helps. Choosing another garbage collector in the above 
>>> command line makes the test finish after a few seconds, e.g. 
>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
>>> I posted this bug report directly to the mailing list, because with 
>>> earlier bug reports, there seem to be a problem with bugs.sun.com - 
>>> there is no response from any reviewer after several weeks and we 
>>> were able to help to find and fix javadoc and javac-compiler bugs 
>>> early. So I hope you can help for this bug, too.
>>> Uwe
>>> -----
>>> Uwe Schindler
>>> uschindler at apache.org
>>> Apache Lucene PMC Member / Committer
>>> Bremen, Germany
>>> http://lucene.apache.org/

More information about the hotspot-gc-dev mailing list