(Preliminary) RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread
thomas.schatzl at oracle.com
Tue Apr 30 14:57:20 UTC 2013
the webrev at http://cr.openjdk.java.net/~tschatzl/7038914/webrev/
presents a first stab at the CR "7038914: VM could throw uncaught OOME
in ReferenceHandler thread".
The problem is that under very heavy memory pressure, there is the
reference handler throws an exception with the message "Exception:
java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in
thread "Reference Handler".
The change improves handling of out-of-memory conditions in the
ReferenceHandler thread. Instead of crashing the thread, and then
disabling reference processing, it catches this exception and continues.
I'd like to discuss the change as I'm not really familiar with JDK
coding style, handling of such situations and have some questions about
- first, I could not reliably reproduce the issue using the information
in the CR. Only via code review (and an idea from Bengt Rutisson -
thanks!) I implemented a nice way to reproduce an OOME in the reference
handler. This involves implementing a custom
java.lang.ref.ReferenceQueue and overriding the enqueue() method, and
doing some allocation that causes an OOME within that method.
My current theory is that synchronization/locking allocates some objects
on the java heap, which are very small, so an OOME in that thread can be
caused. I walked the locking code, but could not find a java heap
allocation there (ObjectMonitor seems to be a C heap object) - maybe I
overlooked it. Probably somebody else knows?
It cannot be the invocation of the Cleaner.clean() methods above the
enqueuing since it has it's own try-catch block already.
Anyway, since the reproducer I wrote shows the same symptoms as reported
in the CR, I hope that this test case is sufficient to be regarded as a
reproducer and the change as a fix.
- the actual change in java/lang/ref/Reference as mentioned involves
putting the entire main enqueuing procedure within a try-catch block.
It only catches OOME to decrease the possibility to catch anything that
should not be caught.
The problem is that this fix does not (and cannot) really fix bad
programming in anyone overriding java.lang.ref.ReferenceQueue.enqueue(),
i.e. if the OOME condition is before the actual execution of the
original enqueue() method, i.e. corruption of the queue may be still
On the other hand, since overriding ReferenceQueue.enqueue() requires
putting the custom ReferenceQueue into the boot class path, I assume
that people doing that are aware of possible issues.
- handling the OOME: in the catch block of the I put a block
// avoid crashing the reference handler thread,
// but provide for some diagnosability
assert false : e.toString();
to provide some diagnosability in the case of an exception (when
running with assertions). I copied that from other code that tries to
catch similar problems in the clean() method of the Cleaners. There are
other variants of managing this in the jdk, some involving calling
system.exit(). I thought that was too drastic, so I didn't do that, but
what is the appropriate way to handle this situation?
- if the use of locks or the synchronization keyword is indeed the
problem, I think it is possible to use nonblocking synchronization that
is known to not allocate any memory for managing the reference queues
instead. However I think to guard against misbehaving ReferenceQueue
implementations you'd still want to have a try-catch block here.
- is the location of the test correct? I.e. in the jdk
test/java/lang/ref directory? Or is the correct place for that the
hotspot test directories?
Since this is (seems to be) a JDK only change, and this is my first time
changing the JDK, I hope core-libs-dev is the right mailing list.
Otherwise please direct me to the the appropriate one.
More information about the core-libs-dev