Fwd: JVM crash HS machine

Ramki Ramakrishna y.s.ramakrishna at oracle.com
Wed Sep 7 10:00:42 PDT 2011


Kris, Thanks for the reminder. As you can tell my memory is short (and 
fading :-)

Anyway, the crash below (and in your emails and in Yogesh's) all seem to 
be with
ParNew (which is the young gen scavenger that typically goes with CMS 
when you
run on an MP platform), not with ParallelScavenge.

In case it resurfaces with more recent JVMs, we should follow up to see 
if something can be done ...
If it's with older JVM's, please follow up with the appropriate support org.

thanks!
-- ramki

On 9/7/2011 9:15 AM, Krystal Mok wrote:
> My bad. I hit "reply" instead of "reply all" on that older thread so 
> my follow-ups didn't show up in the list. I'm including the original 
> mail below. Anyway, it wasn't fixed here, but we don't see a reproduce 
> any more (on both 6u23 and 6u25, 64-bit Server VM), so we're just 
> letting it slip through. One possibility is that we're switching more 
> and more to CMS, and the problem occurred in ParallelScavange.
>
> The original mail:
>
>
>
> ---------- Forwarded message ----------
> From: *Y. Srinivas Ramakrishna* <y.s.ramakrishna at oracle.com 
> <mailto:y.s.ramakrishna at oracle.com>>
> Date: Mon, Apr 18, 2011 at 11:31 PM
> Subject: Re: Crash log when do GC...
> To: Krystal Mok <rednaxelafx at gmail.com <mailto:rednaxelafx at gmail.com>>
>
>
> i wonder if it's an issue with array copy stubs which leave random
> junk in some locations of the array, or if there's a race that causes
> some locations to transiently have bad data. Seems unlikely, but the
> involvement of object arrays raises some suspicions. I'll see if any
> array copying bugs have surfaced or been fixed recently although none
> comes readily to mind...
>
> PS: if it's production runs, you won't be able to use heap verification,
> but if you have a test load that reproduces the problem, may be
> heap verification might give us some clues (although given the nature of
> the problem, I am not hopeful). If you have a support contract,
> I'd suggest filing an official ticket and sending in a couple of core
> files, if you have any sitting around. That may be the only way to
> make progress on this kind of issue.
>
> -- ramki
>
>
> On 4/18/2011 8:16 AM, Krystal Mok wrote:
>
>     Hi,
>
>     I wasn't able to make a minimal repro to this problem, because it
>     seem to
>     happen pretty randomly, running fine for 9 to 15 hours before suddenly
>     crashing with a segfault.
>     It's already running JDK6u23, and there doesn't seem to be a lot
>     of changes
>     to HotSpot that got into JDK6u24, so I doubt if there would be any
>     progress
>     upgrading to this version. Might try JDK6u25b03 and see if there's
>     any luck.
>
>     Attached with this email is another crash log on the same issue.
>     The program
>     had a lot of threads, and crashes with this stack trace:
>
>     Stack: [0x0000000000000000,0x0000000000000000],
>      sp=0x0000000041f8a810,
>      free space=1080874k
>     Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
>     C=native
>     code)
>     V  [libjvm.so+0x3e62c3]<void ParScanClosure::do_oop_work<unsigned
>     int>(unsigned int*, bool, bool)+0x63>
>     V  [libjvm.so+0x60bc83]<objArrayKlass::oop_oop_iterate_nv(oopDesc*,
>     ParScanWithoutBarrierClosure*)+0xf3>
>     V  [libjvm.so+0x6318d4]<ParScanThreadState::trim_queues(int)+0x124>
>     V  [libjvm.so+0x3e61c5]<void
>     ParScanClosure::do_oop_work<oopDesc*>(oopDesc**, bool, bool)+0x105>
>     V  [libjvm.so+0x632260]
>     <ParRootScanWithoutBarrierClosure::do_oop(oopDesc**)+0x10>
>     V  [libjvm.so+0x3702b1]<InterpreterFrameClosure::offset_do(int)+0x31>
>     V  [libjvm.so+0x619776]
>     <InterpreterOopMap::iterate_oop(OffsetClosure*)+0x86>
>     V  [libjvm.so+0x36efd8]<frame::oops_interpreted_do(OopClosure*,
>     RegisterMap
>     const*, bool)+0x188>
>     V  [libjvm.so+0x36fd71]<frame::oops_do_internal(OopClosure*,
>     CodeBlobClosure*, RegisterMap*, bool)+0xb1>
>     V  [libjvm.so+0x728fb3]<JavaThread::oops_do(OopClosure*,
>     CodeBlobClosure*)+0x1d3>
>     V
>      [libjvm.so+0x72bc9e]<Threads::possibly_parallel_oops_do(OopClosure*,
>     CodeBlobClosure*)+0xbe>
>     V  [libjvm.so+0x69572e]<SharedHeap::process_strong_roots(bool, bool,
>     SharedHeap::ScanningOption, OopClosure*, CodeBlobClosure*,
>     OopsInGenClosure*)+0x8e>
>     V
>      [libjvm.so+0x39d75d]<GenCollectedHeap::gen_process_strong_roots(int,
>     bool, bool, bool, SharedHeap::ScanningOption, OopsInGenClosure*, bool,
>     OopsInGenClosure*)+0x7d>
>     V  [libjvm.so+0x6325f6]<ParNewGenTask::work(int)+0xd6>
>     V  [libjvm.so+0x78018d]<GangWorker::loop()+0xaa>
>     V  [libjvm.so+0x7800a4]<GangWorker::run()+0x24>
>     V  [libjvm.so+0x623e1f]<java_start(Thread*)+0x13f>
>
>     JavaThread 0x00002aaab7692800 (nid = 8559) was being processed
>     Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
>     j
>      java.lang.reflect.Array.set(Ljava/lang/Object;ILjava/lang/Object;)V+0
>     J
>      com.taobao.top.core.DefaultBlackBoxEngine.callHsf(Ljava/lang/String;Ljava/lang/String;Ljava/lang/Long;Lcom/taobao/hsf/app/spring/util/SuperHSFSpringConsumerBeanTop;[Ljava/lang/String;[Ljava/lang/Object;Lcom/taobao/top/core/framework/TopPipeResult;)Ljava/lang/Object;
>     J
>      com.taobao.top.core.DefaultApiExecutor.execute(Lcom/taobao/top/core/framework/TopPipeInput;Lcom/taobao/top/core/framework/TopPipeResult;)V
>     J  com.taobao.top.core.framework.TopPipeTask.run()V
>     J
>      java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;
>     J  java.util.concurrent.FutureTask.run()V
>     J
>      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V
>     J  java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>     j  java.lang.Thread.run()V+11
>     v  ~StubRoutines::call_stub
>
>     What's weird about it is that this program would repeatedly crash
>     in the
>     same function in ParNew GC, and that the JavaThread it's working
>     on was in
>     an invocation to java.lang.reflect.Array.set(). In this case it's
>     trying to
>     dereference off a bad pointer decompressed from a narrowOop, but
>     it's hard
>     to trace just where things went wrong at the beginning.
>
>     We'll see if it's affordable to turn on heap verification to trace
>     it down.
>
>     Sincerely,
>     Kris Mok
>
>     On Mon, Apr 18, 2011 at 10:58 PM, Y. Srinivas Ramakrishna<
>     y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>>
>      wrote:
>
>         Hi, i have heard a couple of other reports of this sort recently.
>         But i don't think we have found or fixed any issue recently that
>         might address this. You might want to try a more recent
>         JVM/JDK to confirm if the crash still occurs (which i think
>         it probably will, going by other such reports). Do you have
>         a test case? If so, please file a bug through support or send
>         us your test case off-line. You can also enable heap verification
>         at some considerable GC performance cost and see if that gets us
>         closer to the root cause. (From looking at the stack retrace
>         it appears
>         as though GC finds a bad reference from an object array while
>         copying
>         live objects from the young generation during a scavenge.)
>
>         -- ramki
>
>
>
>         On 4/18/2011 6:48 AM, BlueDavy Lin wrote:
>
>             hi!
>
>                   Rencently our two app often crash when do gc,the
>             crash log
>             attached,can someone give me some advice? thks.
>
>                   ps: I tried to set -XX:-UseCompressedOops,but still
>             crash,and
>             log is the same.
>
>
>
>
>
>
> On Thu, Sep 8, 2011 at 12:06 AM, Ramki Ramakrishna 
> <y.s.ramakrishna at oracle.com <mailto:y.s.ramakrishna at oracle.com>> wrote:
>
>     I didn't see any follow-up on the issue reported at:-
>
>
>     http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-April/002537.html
>
>     so I do not know if that issue ever got satisfactorily resolved. I
>     don't think
>     there are any open bugs in our database for that issue. If there's
>     a test-case we
>     can take a look.
>
>     thanks.
>     -- ramki
>
>
>     On 9/7/2011 4:36 AM, Krystal Mok wrote:
>>     CC'ing hotspot-gc-dev for the first stack trace
>>
>>     ---------- Forwarded message ----------
>>     From: *Krystal Mok* <rednaxelafx at gmail.com
>>     <mailto:rednaxelafx at gmail.com>>
>>     Date: Wed, Sep 7, 2011 at 7:35 PM
>>     Subject: Re: JVM crash HS machine
>>     To: yogesh <ydhaked at amdocs.com <mailto:ydhaked at amdocs.com>>
>>
>>
>>     Hi,
>>
>>     I don't think the two stack traces shown here are of the same
>>     issue. The first one (the one in quotes) seem to be the same as
>>     one mentioned before:
>>     http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2011-April/002537.html ,
>>     but no solutions yet (to my knowledge).
>>
>>     The second stack trace is missing some very important stuff. It's
>>     important to know the caller of the operator new, which means a
>>     deeper stack trace log would help; without that it's quite hard
>>     to infer any context out of the stack trace. It'd also be helpful
>>     to know what signal it was.
>>
>>     Regards,
>>     Kris Mok
>>
>>
>>     On Wed, Sep 7, 2011 at 7:06 PM, yogesh <ydhaked at amdocs.com
>>     <mailto:ydhaked at amdocs.com>> wrote:
>>
>>         Igor Shprukh <igor.shprukh at ...> <mailto:igor.shprukh at ...> writes:
>>
>>         >
>>         > I have attached the hs log file.
>>         > The JVM continuously crashes every two hours.
>>         > Thank You!
>>         > -----Original Message-----
>>         > From: Dmitry Samersoff [mailto:Dmitry.Samersoff
>>         <mailto:Dmitry.Samersoff> <at> oracle.com <http://oracle.com>]
>>         > Sent: Sunday, April 17, 2011 4:53 PM
>>         > To: Igor Shprukh
>>         > Cc: hotspot-runtime-dev <at> openjdk.java.net
>>         <http://openjdk.java.net>
>>         > Subject: Re: JVM crash HS machine
>>         >
>>         > Igor,
>>         >
>>         > Please, send across full hs_err_*.log
>>         >
>>         > -Dmitry
>>         >
>>         > On 2011-04-17 17:23, Igor Shprukh wrote:
>>         > > *Hi all, I have the following error after the running the
>>         JVM for about
>>         > > 5 hrs.*
>>         > >
>>         > > *This is linux – amd 64bit machine with 16 proccesors.*
>>         > >
>>         > > *The crash is at the GC, do you have any ideas on the
>>         cause ?*
>>         > >
>>         > > **
>>         > >
>>         > > *Thank You !*
>>         > >
>>         > > Program terminated with signal 6, Aborted.
>>         > >
>>         > > #0 0x00000035b2430265 in raise () from /lib64/libc.so.6
>>         > >
>>         > > (gdb) bt
>>         > >
>>         > > #0 0x00000035b2430265 in raise () from /lib64/libc.so.6
>>         > >
>>         > > #1 0x00000035b2431d10 in abort () from /lib64/libc.so.6
>>         > >
>>         > > #2 0x00002aed9f0a8fd7 in os::abort(bool) ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #3 0x00002aed9f1fc05d in VMError::report_and_die() ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #4 0x00002aed9f0af655 in JVM_handle_linux_signal ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #5 0x00002aed9f0abbae in signalHandler(int, siginfo*,
>>         void*) ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #6 <signal handler called>
>>         > >
>>         > > #7 0x00002aed9ee64703 in void
>>         ParScanClosure::do_oop_work<unsigned
>>         > > int>(unsigned int*, bool, bool) () from
>>         > > /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #8 0x00002aed9f095d43 in
>>         objArrayKlass::oop_oop_iterate_nv(oopDesc*,
>>         > > ParScanWithoutBarrierClosure*) () from
>>         > > /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #9 0x00002aed9f0bc0e4 in
>>         ParScanThreadState::trim_queues(int) ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #10 0x00002aed9f0bcbde in
>>         ParEvacuateFollowersClosure::do_void() ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #11 0x00002aed9f0bce36 in ParNewGenTask::work(int) ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #12 0x00002aed9f21245d in GangWorker::loop() ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #13 0x00002aed9f212374 in GangWorker::run() ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #14 0x00002aed9f0ae14f in java_start(Thread*) ()
>>         > >
>>         > > from /usr/java/jdk1.6.0_24/jre/lib/amd64/server/libjvm.so
>>         > >
>>         > > #15 0x00000035b2c0673d in start_thread () from
>>         /lib64/libpthread.so.0
>>         > >
>>         > > #16 0x00000035b24d3d1d in clone () from /lib64/libc.so.6
>>         > >
>>         > > (gdb)
>>         > >
>>         >
>>         >
>>
>>
>>
>>
>>         I have same problem with Linux and jdk1.6.0_24.
>>
>>         If any body have any solution please let me know.
>>         Below is the part of gdb stack trace-
>>
>>         Thread 1 (Thread 1996):
>>         #0  0xffffe410 in __kernel_vsyscall ()
>>         No symbol table info available.
>>         #1  0x00b0ddf0 in raise () from /lib/libc.so.6
>>         No symbol table info available.
>>         #2  0x00b0f701 in abort () from /lib/libc.so.6
>>         No symbol table info available.
>>         #3  0xf78d823f in os::abort(bool) ()
>>         from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>         No symbol table info available.
>>         #4  0xf7a1f431 in VMError::report_and_die() ()
>>         from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>         No symbol table info available.
>>         #5  0xf78df1dc in JVM_handle_linux_signal ()
>>         from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>         No symbol table info available.
>>         #6  0xf78db124 in signalHandler(int, siginfo*, void*) ()
>>         from /usr/java/jdk1.6.0_24/jre/lib/i386/server/libjvm.so
>>         No symbol table info available.
>>         #7 <signal handler called>
>>         No symbol table info available.
>>         #8  0x00b4ef5f in _int_malloc () from /lib/libc.so.6
>>         No symbol table info available.
>>         #9  0x00b50fb7 in malloc () from /lib/libc.so.6
>>         No symbol table info available.
>>         #10 0x4c242af7 in operator new(unsigned int) () from
>>         /usr/lib/libstdc++.so.6
>>
>>         Thanks
>>         /Y
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20110907/cd442521/attachment-0001.html 


More information about the hotspot-gc-dev mailing list