bad narrowOop in objArrayKlass::oop_oop_iterate_nv
rednaxelafx at gmail.com
Thu May 3 00:24:17 PDT 2012
A couple of our servers in production has hit a segfault during minor GC in
a ParNew/CMS configuration, with compressed oops on. The symptom looks
pretty much alike some crashes we hit last year, and the same problem may
have showed up on this list and hotspot-gc-use list already . But I
couldn't find the bug ID that matches this symptom.
Does anybody know if this is a known bug, if so, is it fixed yet?
Right now I've upgraded one of the crashing servers to JDK6u32/HS20, hoping
that the problem doesn't reproduce in this version.
JRE version: 6.0_23-b05
Java VM: Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode linux-amd64
SIGSEGV at 0x0000000000000030 in a GCTaskThread.
Relevant VM args:
-Xms4g -Xmx4g -XX:PermSize=96m -XX:MaxPermSize=256m -Xmn2000m
-XX:MaxTenuringThreshold=5 -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
Relevant stack trace:
V [libjvm.so+0x3e62c3] <void ParScanClosure::do_oop_work<unsigned
int>(unsigned int*, bool, bool)+0x63>
V [libjvm.so+0x60bc83] <objArrayKlass::oop_oop_iterate_nv(oopDesc*,
V [libjvm.so+0x6318d4] <ParScanThreadState::trim_queues(int)+0x124>
V [libjvm.so+0x6323ce] <ParEvacuateFollowersClosure::do_void()+0x1e>
V [libjvm.so+0x632626] <ParNewGenTask::work(int)+0x106>
V [libjvm.so+0x78018d] <GangWorker::loop()+0xad>
V [libjvm.so+0x7800a4] <GangWorker::run()+0x24>
V [libjvm.so+0x623e1f] <java_start(Thread*)+0x13f>
Debugging the core dump, I was able to track down the object in question
(gdb) x/3g 0x760cba7a8
# String of length 1
0x760cba7a8: 0x0000000000000009 0x00000001fe01c4c7
Which is a String array of length 1. The sole element in this array is a
narrowOop 0x00000005, which is definitely a bad pointer.
The segfault site was dereferencing the uncompressed value of this pointer
(0x30 == 0x05 << 3).
I don't have any idea yet how this bad pointer came into being, hard to
tell just from a core dump. This is a production site server so I can't
turn on heap verification on it, either. Worse, no repros in our
Ramki mentioned the following in a reply to :
On Mon, Apr 18, 2011 at 11:31 PM, Y. Srinivas Ramakrishna <
y.s.ramakrishna at oracle.com> wrote:
> i wonder if it's an issue with array copy stubs which leave random
> junk in some locations of the array, or if there's a race that causes
> some locations to transiently have bad data. Seems unlikely, but the
> involvement of object arrays raises some suspicions. I'll see if any
> array copying bugs have surfaced or been fixed recently although none
> comes readily to mind...
> PS: if it's production runs, you won't be able to use heap verification,
> but if you have a test load that reproduces the problem, may be
> heap verification might give us some clues (although given the nature of
> the problem, I am not hopeful). If you have a support contract,
> I'd suggest filing an official ticket and sending in a couple of core
> files, if you have any sitting around. That may be the only way to
> make progress on this kind of issue.
> -- ramki
I could investigate further with this core dump at hand. Any
instructions/suggestions would be appriciated :-)
I can provide other relevant information as needed.
(I'm sorry I can't attach the full crash log in this mail because there's
confidential information in it.)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-gc-dev