RFR (XS) 8161280 - assert failed: reference count underflow for symbol

Ioi Lam ioi.lam at oracle.com
Wed Aug 24 01:32:13 UTC 2016

On 8/23/16 6:08 PM, David Holmes wrote:
> On 24/08/2016 11:03 AM, Ioi Lam wrote:
>> Hi Coleen, thanks for suggestion the simplification:
>> void Symbol::decrement_refcount() {
>> #ifdef ASSERT
>>   if (_refcount == 0) {
>>       print();
>>       assert(false, "reference count underflow for symbol");
>>     }
>>   }
>> #endif
>>   Atomic::dec(&_refcount);
>> }
>> There's a race condition that won't detect the underflow. E.g., refcount
>> is 1. Two threads comes in and decrement at the same time. We will end
>> up with -1.
> So if we're going this path then you can get rid of the race by using 
> Atomic::add(&_refcount, -1), which returns the updated value. If you 
> get back -1 then assert.

The problem is we allow -1 to mean "permanent". Symbols in the CDS 
archive are mapped read-only and have refcount==-1. If we decrement them 
we get a SEGV.

Also, if we blindly decrement the refcount, we could end up rolling back 
all the way from 0xffff to 0x0, at which point we may free the Symbol 
which may still be in use.

The fundamental problem is we need a two-step operation and we avoid 
proper synchronization (for performance/avoiding deadlock/etc). So we 
will always have a race condition somewhere, and we just need to allow 
only the benign ones.

- Ioi

> Cheers,
> David
>> However, it's not worse than before. The old version also has a race
>> condition:
>>     refcount is 0
>>     thread A decrements
>>     thread B increments
>>     thread A checks for underflow
>> the decrementing thread will read _refcount==0 at the end so it won't
>> detect the (transient) underflow.
>> I think the failure to detect underflow is fine, since this happens only
>> with concurrent access. The kinds of underflow that we are interested
>> usually can be caught in single-threaded situations.
>> Thanks
>> - Ioi
>> On 8/23/16 4:24 PM, Coleen Phillimore wrote:
>>> This doesn't make sense for me and I have to go in gdb to print out
>>> what -16384 is.   It appears that this is trying to detect that we
>>> went below zero from zero, which is an error, but this isn't clear at
>>> all.
>>> It seems that
>>>    if (_refcount >= 0) {
>>> Should be > 0 and we should assert if this is ever zero instead, and
>>> allow anything negative to mean that this count has gone immortal.
>>> Kim thought it should use CAS rather than atomic increment and
>>> decrement, but maybe that isn't necessary, especially since there
>>> isn't a short version of cmpxchg.
>>> thanks,
>>> Coleen
>>> On 8/23/16 6:01 AM, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8161280
>>>> http://cr.openjdk.java.net/~iklam/jdk9/8161280-symbol-refcount-underflow.v01/ 
>>>> Summary:
>>>> The test was loading a lot of JCK classes into the same VM. Many of
>>>> the JCK classes refer to "javasoft/sqe/javatest/Status", so the
>>>> refcount (a signed short integer) of this Symbol would run up and
>>>> past 0x7fff.
>>>> The assert was caused by a race condition: the refcount started with
>>>> a large (16-bit) positive value such as 0x7fff, one thread is
>>>> decrementing and several other threads are incrementing. The refcount
>>>> will end up being 0x8000 or slightly higher (limited to the number of
>>>> concurrent threads that are running within a small window of several
>>>> instructions in the decrementing thread, so most likely it will be
>>>> 0x800?).
>>>> As a result, the decrementing thread found that the refecount is
>>>> negative after the operation, and thought that an underflow had
>>>> happened.
>>>> The fix is to ignore any value that may appear in the [0x8000 -
>>>> 0xbfff] range and do not flag these as underflows (since they are
>>>> most likely overflows -- overflows are already handled by making the
>>>> Symbol permanent).
>>>> Thanks
>>>> - Ioi

More information about the hotspot-runtime-dev mailing list