RFR: 8055232 (ref) Exceptions while processing Reference pending list

David Holmes david.holmes at oracle.com
Fri Sep 19 07:45:22 UTC 2014

On 19/09/2014 5:35 PM, Peter Levart wrote:
> On 09/19/2014 09:06 AM, David Holmes wrote:
>> Hi Peter,
>> Two comments:
>> 1. Doing the Object.wait() violates the thou-shalt-not-allocate rule,
>> due to the potential for creating the InterruptedException. But there
>> is no way to avoid that.
> It depends on when the InterruptedException is allocated. While
> Object.wait()-ing, the lock is not held. If, while waiting, the
> interruption is detected, it could be that InterruptedException is
> allocated before the lock is attempted to be re-gained. This can easily
> be verified - stay tuned...

Allocation happens after the monitor is re-acquired.

>> 2. I don't see how the instanceof can still result in OOME if we have
>> pre-loaded and initialized the relevant classes.
> I don't see either. So you're more for a variant that we move instanceof
> check outside the synchronized block to where it was before and wait for
> stress tests to show if we're right? This might be a cleaner way to
> final solution...

I think I see what you are saying now ...

>> So I'm not sure what it is that needs fixing now ?? Note that if the
>> OOME is thrown while holding the lock it means we haven't deadlocked
>> with the GC.
> This might be an indication that there's not any allocation taking place
> in instanceof check any more (wait is different, since it's releasing
> lock temporarily). If it was taking place, the stress tests that
> detected OOME before, would detect deadlock now, right?

There's only a potential for deadlock, it isn't inevitable. You'd need 
to see when the GC needs to acquire this lock.

> I understand that this is more of a clean-up attempt than fixing any
> real issue.

I'm inclined to take the "if it ain't broke ..." position on this. But I 
do see your point.


> Regards, Peter
>> David
>> On 19/09/2014 4:43 PM, Peter Levart wrote:
>>> Hi,
>>> This story has a long tail. It started with:
>>>      https://bugs.openjdk.java.net/browse/JDK-7038914
>>> Some stress tests triggered OOME in ReferenceHandler thread which would
>>> die. The first attempt at fixing this was the following discussion:
>>> https://www.mail-archive.com/core-libs-dev%40openjdk.java.net/msg16250.html
>>> Which resulted in patch:
>>>      http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/0b8dab7fec54
>>> This assumed that ReferenceHandler thread doing Object.wait() could be
>>> interrupted (by stress test - normal application don't do that) and
>>> failed to allocate InterruptedException object. A jtreg test was
>>> designed which triggered that situation and a fix would catch OOME and
>>> ignore it.
>>> But the stress tests (the same or some other, I don't know) apparently
>>> were not entirely happy with this fix. The following discussion
>>> describes this:
>>> https://www.mail-archive.com/core-libs-dev%40openjdk.java.net/msg23596.html
>>> The other "unprotected" point at which OOME could be thrown and was
>>> later confirmed by debugging is (r instanceof Cleaner) test. The
>>> assumption was that it could trigger Cleaner class loading at 1st
>>> execution which would cause OOME to be thrown. The fix that finally
>>> silenced stress tests was the following:
>>>      http://hg.openjdk.java.net/jdk9/dev/jdk/rev/d04102f69d46
>>> This part of code (the j.l.r.Reference class and its members) has
>>> undergone further changes afterwards for fixing another bug:
>>>      https://bugs.openjdk.java.net/browse/JDK-6857566
>>> With following patch:
>>>      http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/9934d34ed3c0
>>> But this did not change the code paths - just factored-out the content
>>> of the loop into a separate method that could be used from outside.
>>> All well until kim.barrett at oracle.com noticed that the 2nd fix
>>> introduced a potentially illegal situation. There is a
>>> j.l.r.Reference.Lock inner class and a singleton object assigned to
>>> static field in j.l.Reference class with the following notice:
>>>      /* Object used to synchronize with the garbage collector. The
>>> collector
>>>       * must acquire this lock at the beginning of each collection
>>> cycle.  It is
>>>       * therefore critical that any code holding this lock complete as
>>> quickly
>>>       * as possible, allocate no new objects, and avoid calling user
>>> code.
>>>       */
>>>      static private class Lock { }
>>>      private static Lock lock = new Lock();
>>> The conflicting part is "allocate no new objects". Catching OOME inside
>>> a synchronized block holding this lock implies that new objects could be
>>> allocated. I have a feeling that the 2nd fix prevented that by
>>> pre-loading the Cleaner class at Reference class initialization time.
>>> But because it was hard to reproduce the situation where OOME was thrown
>>> from (r instanceof Cleaner) check, we nevertheless handled this
>>> hypothetical situation. Perhaps it would be better that we didn't and
>>> just see if OOME returned after just adding the pre-loading of Cleaner
>>> class...
>>> So here we are, at an attempt to clean this up:
>>>      https://bugs.openjdk.java.net/browse/JDK-8055232
>>> We can move (r instanceof Cleaner) check outside of synchronized block
>>> to where it was before the 2nd fix and wait what stress tests will show.
>>> Another possibility is to move the instanceof check outside of
>>> synchronized block, but handle the hypothetical OOME by re-linking the
>>> unlinked reference back into the pending chain:
>>> http://cr.openjdk.java.net/~plevart/jdk9-dev/ReferenceHandlerExceptions/webrev.01/
>>> What would you suggest?
>>> Regards, Peter

More information about the core-libs-dev mailing list