RFC 7038914: VM could throw uncaught OOME in ReferenceHandler thread

Thomas Schatzl thomas.schatzl at oracle.com
Tue May 7 13:26:36 UTC 2013


On Tue, 2013-05-07 at 15:12 +0200, Peter Levart wrote:
> On 05/07/2013 09:51 AM, Thomas Schatzl wrote:
> > Hi all,
> >
> > On Tue, 2013-05-07 at 12:31 +1000, David Holmes wrote:
> >> Catching ThreadDeath is futile. If someone is invoking stop() then you
> >> can encounter the ThreadDeath anywhere and it is impossible to write
> >> completely robust code in the face of such an async exception. So please
> >> let's not even go there. stop() is long deprecated and should never be used.
> >>
> >> Backing up I think the try/catch(IE|OOME) around wait() is the most
> >> reasonable solution here. Anyone messing with instrumentation or
> >> overriding etc can break things - so be it - don't do that.
> >> StackOverflowError can also completely break many things - again it is
> >> effectively an async exception and writing async-exception-safe Java
> >> code is impractical if not impossible.
> >    I can understand this reasoning.
> >
> > I provided a new patch (this time for review)
> > http://cr.openjdk.java.net/~tschatzl/7038914/webrev.1/ which implements
> > this change as suggested.
> >
> > Regarding regression testing, I marked this bug as "noreg-other" with
> > the explanation that it is too hard to write a proper regression test,
> > and the note that any test would involve using methods that we don't
> > give any guarantees for (overriding package private jdk methods,
> > instrumentation).
> Hi Thomas,
> Does the bug reproducer I sent to the list not work for you? The test 
> can check the return value of refQueue.poll() and decide if it passes or 
> not (null return means the ReferenceHandler thread has died and the bug 
> is here, non-null return means thread still works and there is no bug).

I will check the code again, but unfortunately I think it does not help
a lot.

The problem of reproducing this issue is trying to get the
ReferenceHandler to die, i.e. have the OOME occur in the reference
handler thread.

The allocation of the InterruptedException is such a small allocation so
that in almost all of the cases of OOME, its allocation still succeeds
or is not the actual cause for the OOME. So the probability that the
java application threads get the OOME to handle is much higher,
especially in the stress tests.

There is a message emitted by the VM reading "java.lang.OutOfMemoryError
thrown from the UncaughtExceptionHandler in thread "Reference Handler""
that is sufficient to detect the problem itself (at least if you enable
some flags).

I will look at it again and report back if it can be used in some way.


More information about the hotspot-gc-dev mailing list