JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Uwe Schindler uschindler at apache.org
Wed Mar 6 12:16:56 PST 2013


Hi again,

If you don't get it running, I can do the following:

I may set it up in /tmp locally. Then build and test the whole Lucene library including test. I could then TAR it up (might be large approx. 100 MB) and send it to you via dropbox or any other HTTP download. The command line of the JVM uses absolute paths for all JARs and other settings, but if you unpack the whole thing to /tmp, you could reuse the cmd line.

Just tell me, if you were able to set it up, otherwise I can quickly TAR you the whole compiled directory and give you the command line from debugging output. If you unpack to another directory you might need to edit the command line with its absolute paths (which are generated by ANT).

Uwe

-----
Uwe Schindler
uschindler at apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/


> -----Original Message-----
> From: John Cuthbertson [mailto:john.cuthbertson at oracle.com]
> Sent: Wednesday, March 06, 2013 8:21 PM
> To: Uwe Schindler
> Cc: 'Bengt Rutisson'; hotspot-gc-dev at openjdk.java.net;
> dev at lucene.apache.org
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)
> 
> Hi Uwe,
> 
> Let me try with your detailed instructions below before you go to all of that
> trouble. I will let you know how I get on.
> 
> Thanks,
> 
> JohnC
> 
> On 3/6/2013 11:15 AM, Uwe Schindler wrote:
> > Hi,
> >
> > That's unfortunately not so easy, because of project dependencies. To run
> the test you have to compile Lucene Core then the specific module + the test
> framework (which is special for Lucene) and download some JARs from
> Maven central (JAR hell, as usual).
> > If you give me some time, I would collect all needed JAR files from my local
> checkout and provide you the correct cmd line + a ZIP file with maybe a shell
> script to startup. It should be doable, but needs some work to collect all
> dependencies for the classpath.
> >
> > If you want to do it quicker (should be quite fast to do):
> > - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making
> it not working out of the box with Java 8):
> http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I
> just wonder about the fact: isn't ANT needed to build the JDK classlib by
> itself? I remember that the FreeBSD OpenJDK build downloads ANT and does
> a large part of the compilation using ANT...
> > - put the ANT bin/ dir into your PATH
> > - download the Apache Lucene source code from Jenkins:
> > https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/luc
> > ene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz
> > - go to extracted lucene source dir, call "ant ivy-bootstrap" (this
> > will download Apache IVY, so all dependencies can be downloaded from
> > Maven Central)
> > - change to the module that fails: # cd analysis/uima
> > - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3
> > -Dtests.jvms=1 test
> > - In a parallel console you might be able to attach to the process, the build
> in the main console using ANT runs inside ANT and the test framework
> spawns separate worker instances of the JVM to execute the tests. This
> makes it hard to reproduce in standalone (the command line passed to the
> child JVM is veeeeery long).
> >
> > I will work on putting together a precompiled ZIP file with all needed JARs +
> the command line. Just tell me if you got it managed with the above howto,
> then I don’t need to do this.
> > Uwe
> >
> > -----
> > Uwe Schindler
> > uschindler at apache.org
> > Apache Lucene PMC Member / Committer
> > Bremen, Germany
> > http://lucene.apache.org/
> >
> >
> >> -----Original Message-----
> >> From: John Cuthbertson [mailto:john.cuthbertson at oracle.com]
> >> Sent: Wednesday, March 06, 2013 7:51 PM
> >> To: Uwe Schindler
> >> Cc: 'Bengt Rutisson'; hotspot-gc-dev at openjdk.java.net;
> >> dev at lucene.apache.org
> >> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> >> bit)
> >>
> >> Hi Uwe,
> >>
> >> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from
> >> https://builds.apache.org/job/Lucene-Artifacts-
> >> trunk/2212/artifact/lucene/dist/
> >>
> >> I don't have ant on my workstation so do you have a java command line
> >> to run the test(s) that generate the error?
> >>
> >> Thanks,
> >>
> >> JohnC
> >>
> >> On 3/6/2013 3:16 AM, Uwe Schindler wrote:
> >>> Hi,
> >>>
> >>>> I think this is a VM bug and the thread dumps that Uwe produced are
> >>>> enough to start tracking down the root cause.
> >>> I hope it is enough! If I can help with more details, tell me what I
> >>> should do
> >> to track this down. Unfortunately, we have no isolated test case
> >> (like a small java class that triggers this bug) - you have to run
> >> the test cases of this Lucene's module. It only happens there, not in
> >> any other Lucene test suite. It may be caused by a lot of GC activity in this
> "UIMA" module or a specific test.
> >>>> On 3/6/13 8:52 AM, David Holmes wrote:
> >>>>> If the VM is completely unresponsive then it suggests we are at a
> >>>>> safepoint.
> >>>> Yes, we are hanging during a stop-the-world GC, so we are at a
> safepoint.
> >>>>
> >>>>> The GC threads are not "hung" in os::parK, they are parked -
> >>>>> waiting to be notified of something.
> >>>> It looks like the reference processing thread is stuck in a loop
> >>>> where it does wait(). So, the VM is hanging even if that stack
> >>>> trace also ends up in os::park().
> >>>>
> >>>>> The thing is to find out why they are not being woken up.
> >>>> Actually, in this case we should probably not even be calling wait...
> >>>>
> >>>>> Can the gdb log be posted somewhere? I don't know if the
> >>>>> attachment made it to the original posting on hotspot-gc but it's
> >>>>> no longer available on hotspot-dev.
> >>>> I received the attachment with the original email. I've attached it
> >>>> to the bug report that I created: 8009536. You can find it there if
> >>>> you want to. But I think we have a fairly good idea of what change
> >>>> caused the hang.
> >>> If it helps: Unfortunately, we had some problems with recent JDK
> >>> builds,
> >> because javac and javadoc tools were not working correctly, failing
> >> to build our source code. Since b78 this was fixed. Until this was
> >> fixed, we used build
> >> b65 (which was the last one working) and the G1GC hangs did not
> >> appear on this version. So it must have happened by a change after b65 till
> b78.
> >>> Uwe
> >>>
> >>>> Bengt
> >>>>
> >>>>> Thanks,
> >>>>> David
> >>>>>
> >>>>> On 6/03/2013 4:07 PM, Krystal Mok wrote:
> >>>>>> Hi Uwe,
> >>>>>>
> >>>>>> If you can attach gdb onto it, and jstack -m and jstack -F should
> >>>>>> also work; that'll get you the Java stack trace.
> >>>>>> (But it probably doesn't matter in this case, because the hang is
> >>>>>> probably bug in the VM).
> >>>>>>
> >>>>>> - Kris
> >>>>>>
> >>>>>> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler
> >>>> <uschindler at apache.org>
> >>>>>> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> since a few month we are extensively testing various preview
> >>>>>>> builds of JDK 8 for compatibility with Apache Lucene and Solr,
> >>>>>>> so we can find any bugs early and prevent the problems we had
> >>>>>>> with the release of Java 7 two years ago. Currently we have a
> >>>>>>> Linux (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK
> >>>>>>> 6, JDK 7, JDK 8 snapshot, IBM J9, older JRockit) installed,
> >>>>>>> choosing a different one with different hotspot and garbage
> >>>>>>> collector settings on every run of the test suite (which takes
> >>>>>>> approx. 30-45
> >> minutes).
> >>>>>>> JDK 8 b79 works so far very well on Linux, we found some strange
> >>>>>>> behavior in early versions (maybe compiler errors), but no
> >>>>>>> longer at the moment. There is one configuration that constantly
> >>>>>>> and reproducibly hangs in one module that is tested: The
> >>>>>>> configuration uses JDK 8 b79 (same for b78), 32 bit, and G1GC
> >>>>>>> (server or client does not matter). The JVM running the tests
> >>>>>>> hangs irresponsible (jstack or kill -3 have no effect/cannot
> >>>>>>> connect, standard kill does not stop it, only kill -9 actually
> >>>>>>> kills it). It can be reproduced in this Lucene module 100% (it hangs
> always).
> >>>>>>>
> >>>>>>> I was able to connect with GDB to the JVM and get a stack trace
> >>>>>>> on all threads (see attachment, dump.txt). As you see all
> >>>>>>> threads of G1GC seem to hang in a syscall (os:park(), a
> >>>>>>> conditional wait in pthread library). Unfortunately that’s all I
> >>>>>>> can give you. A Java stacktrace is not possible because the JVM
> >>>>>>> reacts on neither kill
> >>>>>>> -3 nor jstack. With all other garbage collectors it passes the
> >>>>>>> test without hangs in a few seconds, with 32 bit G1GC it can
> >>>>>>> stand still for hours. The 64 bit JVM passes with G1GC, so only
> >>>>>>> the 32 bit variant is affected. Client or Server VM makes no
> difference.
> >>>>>>>
> >>>>>>> To reproduce:
> >>>>>>> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but
> >>>>>>> this should not matter)
> >>>>>>> - Download Lucene Source code (e.g. the snapshot version we
> were
> >>>>>>> testing with:
> >>>>>>> https://builds.apache.org/job/Lucene-Artifacts-
> >>>> trunk/2212/artifact/lucene/dist/)
> >>>>>>> - change to directory lucene/analysis/uima and run:
> >>>>>>>            ant -Dargs="-server -XX:+UseG1GC"
> >>>>>>> -Dtests.multiplier=3
> >>>>>>> -Dtests.jvms=1 test
> >>>>>>> After a while the test framework prints "stalled" messages
> >>>>>>> (because the child VM actually running the test no longer
> >>>>>>> responds). The PID is also printed. Try to get a stack trace or
> >>>>>>> kill it, no
> >> response.
> >>>>>>> Only kill -9 helps. Choosing another garbage collector in the
> >>>>>>> above command line makes the test finish after a few seconds, e.g.
> >>>>>>> -Dargs="-server -XX:+UseConcMarkSweepGC"
> >>>>>>>
> >>>>>>> I posted this bug report directly to the mailing list, because
> >>>>>>> with earlier bug reports, there seem to be a problem with
> >>>>>>> bugs.sun.com - there is no response from any reviewer after
> >>>>>>> several weeks and we were able to help to find and fix javadoc
> >>>>>>> and javac-compiler bugs early. So I hope you can help for this bug,
> too.
> >>>>>>>
> >>>>>>> Uwe
> >>>>>>>
> >>>>>>> -----
> >>>>>>> Uwe Schindler
> >>>>>>> uschindler at apache.org
> >>>>>>> Apache Lucene PMC Member / Committer Bremen, Germany
> >>>>>>> http://lucene.apache.org/
> >>>>>>>
> >>>>>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe at lucene.apache.org For additional
> commands, e-mail: dev-help at lucene.apache.org



More information about the hotspot-gc-dev mailing list