JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

David Holmes david.holmes at oracle.com
Wed Mar 6 02:23:10 PST 2013

On 6/03/2013 5:55 PM, Dawid Weiss wrote:
> Here you go:
> http://pastebin.com/raw.php?i=b2PHLm1e

Thanks. I would have to say this seems to be the suspicious part:

Thread 22 (Thread 0xf20ffb40 (LWP 22939)):
#0  0xf7743430 in __kernel_vsyscall ()
#1  0xf771e96b in pthread_cond_wait@@GLIBC_2.3.2 () from 
#2  0xf6ec849c in os::PlatformEvent::park() ()
#3  0xf6e98b82 in Monitor::IWait(Thread*, long long) ()
#4  0xf6e99370 in Monitor::wait(bool, long, bool) ()
#5  0xf6b5fb16 in SuspendibleThreadSet::join() ()
#6  0xf6b5ea41 in ConcurrentG1RefineThread::run_young_rs_sampling() ()
#7  0xf6b5ef91 in ConcurrentG1RefineThread::run() ()

The suspendible thread set logic looks 'tricky". Time for the G1 experts 
to take over. :)


> Dawid
> On Wed, Mar 6, 2013 at 8:52 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>     If the VM is completely unresponsive then it suggests we are at a
>     safepoint.
>     The GC threads are not "hung" in os::parK, they are parked - waiting
>     to be notified of something.
>     The thing is to find out why they are not being woken up.
>     Can the gdb log be posted somewhere? I don't know if the attachment
>     made it to the original posting on hotspot-gc but it's no longer
>     available on hotspot-dev.
>     Thanks,
>     David
>     On 6/03/2013 4:07 PM, Krystal Mok wrote:
>         Hi Uwe,
>         If you can attach gdb onto it, and jstack -m and jstack -F
>         should also
>         work; that'll get you the Java stack trace.
>         (But it probably doesn't matter in this case, because the hang is
>         probably bug in the VM).
>         - Kris
>         On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
>         <uschindler at apache.org <mailto:uschindler at apache.org>> wrote:
>             Hi,
>             since a few month we are extensively testing various preview
>             builds of JDK 8 for compatibility with Apache Lucene and
>             Solr, so we can find any bugs early and prevent the problems
>             we had with the release of Java 7 two years ago. Currently
>             we have a Linux (Ubuntu 64bit) Jenkins machine that has
>             various JDKs (JDK 6, JDK 7, JDK 8 snapshot, IBM J9, older
>             JRockit) installed, choosing a different one with different
>             hotspot and garbage collector settings on every run of the
>             test suite (which takes approx. 30-45 minutes).
>             JDK 8 b79 works so far very well on Linux, we found some
>             strange behavior in early versions (maybe compiler errors),
>             but no longer at the moment. There is one configuration that
>             constantly and reproducibly hangs in one module that is
>             tested: The configuration uses JDK 8 b79 (same for b78), 32
>             bit, and G1GC (server or client does not matter). The JVM
>             running the tests hangs irresponsible (jstack or kill -3
>             have no effect/cannot connect, standard kill does not stop
>             it, only kill -9 actually kills it). It can be reproduced in
>             this Lucene module 100% (it hangs always).
>             I was able to connect with GDB to the JVM and get a stack
>             trace on all threads (see attachment, dump.txt). As you see
>             all threads of G1GC seem to hang in a syscall (os:park(), a
>             conditional wait in pthread library). Unfortunately that’s
>             all I can give you. A Java stacktrace is not possible
>             because the JVM reacts on neither kill -3 nor jstack. With
>             all other garbage collectors it passes the test without
>             hangs in a few seconds, with 32 bit G1GC it can stand still
>             for hours. The 64 bit JVM passes with G1GC, so only the 32
>             bit variant is affected. Client or Server VM makes no
>             difference.
>             To reproduce:
>             - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but
>             this should not matter)
>             - Download Lucene Source code (e.g. the snapshot version we
>             were testing with:
>             https://builds.apache.org/job/__Lucene-Artifacts-trunk/2212/__artifact/lucene/dist/
>             <https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/>)
>             - change to directory lucene/analysis/uima and run:
>                       ant -Dargs="-server -XX:+UseG1GC"
>             -Dtests.multiplier=3 -Dtests.jvms=1 test
>             After a while the test framework prints "stalled" messages
>             (because the child VM actually running the test no longer
>             responds). The PID is also printed. Try to get a stack trace
>             or kill it, no response. Only kill -9 helps. Choosing
>             another garbage collector in the above command line makes
>             the test finish after a few seconds, e.g. -Dargs="-server
>             -XX:+UseConcMarkSweepGC"
>             I posted this bug report directly to the mailing list,
>             because with earlier bug reports, there seem to be a problem
>             with bugs.sun.com <http://bugs.sun.com> - there is no
>             response from any reviewer after several weeks and we were
>             able to help to find and fix javadoc and javac-compiler bugs
>             early. So I hope you can help for this bug, too.
>             Uwe
>             -----
>             Uwe Schindler
>             uschindler at apache.org <mailto:uschindler at apache.org>
>             Apache Lucene PMC Member / Committer
>             Bremen, Germany
>             http://lucene.apache.org/

More information about the hotspot-gc-dev mailing list