SMP JNI issue, UseMembar workaround resolves it
David.Holmes at oracle.com
Sat Jun 11 05:07:35 PDT 2011
David Holmes said the following on 06/11/11 17:04:
> I'll try and take a deeper look at this but note that if a safepoint is
> pending the thread is supposed to "crash" in
> write_memory_serialize_page. The SEGV so generated should be handled by
> the VM and take the thread to the safepoint.
The SEGV handler doesn't take the thread to the safepoint it just delays
the thread until the serialization page is unprotected. The subsequent
state transition logic will take the thread to the safepoint if needed.
They thing is to see exactly what is reported when the real crash
occurs. If we get a simple OS-level abort message then the VM signal
handler did not get invoked which may indicate that native code has
changed the installed signal handlers.
> It seems the signal is not
> being handled correctly. UseMembar will workaround this by not using the
> serialization page.
> If you observed this SEGV under gdb then it may be a red-herring as gdb
> is stopping the VM from handling the SEGV when it is actually an
> expected signal.
> When the real crash occurs what exactly gets reported?
> David Holmes
> Scott Valentine said the following on 06/11/11 15:58:
>> We ran into an issue where our application would consistently crash
>> with a
>> segmentation violation after roughly 15 minutes to 90 minutes of runtime.
>> It's not exactly a bug, but I thought it would be helpful to post the
>> information here for other folks, and to hopefully support the great work
>> of OpenJDK developers down the road.
>> The quick details are that we consistently die without much error detail
>> (just a simple segmentation violation printout) when our code enters JNI,
>> does some stuff, and then calls back into the VM. The JNI_ENTRY fails
>> calling transition_from_native.
>> The client application is running on an Asus Aspire-One netbook (Atom
>> N270, dual core @800MHz) with OpenJDK-1.6.0-20-1.9.7. A gdb stack trace
>> and jstack dump is attached for details on what is happening. More
>> on the system structure are included below for those interested, but
>> basically it is a moderately threaded, intensively JNI application
>> under the Equinox OSGi runtime.
>> It was a little tough to debug, as the clients are remote and I have
>> to go
>> through multiple ssh back-doors. We initially suspected our JNI
>> middleware, but after getting the necessary debugging symbols, tools, and
>> builds in place, we found that it was always crashing on the
>> write_memory_serialize_page call when attempting JNI_ENTRY after spending
>> some time in the native code. It never even got to the point of reference
>> values like the VM env, jobject, etc. Anyhow, the source for the
>> transition_from_native call led us to try the -X:+UseMembar option which
>> seems to have resolved the issue.
>> Anyhow, I hope the trace info is helpful, and please let me know if I can
>> provide more info. I can't spare a ton of cycles, but I would be happy to
>> contribute as time permits.
>> Here are the application details:
>> As mentioned previously, the application is running in the Equinox OSGi
>> framework, and it relies heavily on two JNI libraries: the RXTX library
>> (2.1-7r2), and a middleware called opensplice DDS (5.4.1). Opensplice
>> is a
>> shared memory model runtime that runs as three seperate processes, and
>> a JNI interface into the framework. The application has two serial
>> (two RXTX threads), and we have a thread for each (two more threads) that
>> does blocking reads on those ports. These threads put data into a
>> BlockingQueue, which has another thread that takes data from the queue
>> processes it (two more threads). These threads process the data, make JNI
>> calls into the DDS middleware (this is where the failures have, at least
>> so far, always occurred), and put some information into another Blocking
>> Queue. There are two other application threads (total of eight now). The
>> first periodically writes to one of the serial port. The other thread
>> handles the second blocking Queue and also makes JNI calls into the DDS
>> middleware. Overall, there are three threads calling into that middleware
>> I think there are something like 20 threads total, but three are the JVM
>> threads, and 7 or so are related to Equinox and our launcher that don't
>> really do anything unless the system is starting or stopping or doing
>> something in the OSGi world.
>> Thanks, and again, I hope this info can be helpfult to others.
>> Scott Valentine
>> Concentris Systems LLC
>> Manoa Innovation Center, Suite #238
>> 2800 Woodlawn Drive
>> Honolulu, HI 96822
>> (808) 988-6100
More information about the hotspot-dev