RFR (xs) 8145940: TempNewSymbol should have correct copy and assignment functions

Markus Gronlund markus.gronlund at oracle.com
Wed Jan 13 10:35:39 UTC 2016

Hi Coleen,

The Test_TempNewSymbol() look fine - thank you for adding it.

Maybe you could also complement with a simple self-assignment test as well?  

  s1 = s1; // self assignment
  assert(s1->refcount() == 2, "should still be two abc (s1 and ss)");



-----Original Message-----
From: Coleen Phillimore 
Sent: den 12 januari 2016 22:59
To: Markus Gronlund; Kim Barrett
Cc: hotspot-runtime-dev
Subject: Re: RFR (xs) 8145940: TempNewSymbol should have correct copy and assignment functions

On 1/12/16 7:34 AM, Markus Gronlund wrote:
> Hi Kim,
> Thanks a lot for the (detailed) reminder about the copy-and-swap idiom. I agree with your reasoning.

Two for (strongly?) and two against (myself, not so strongly and we can always add a comment to save someone else's time).  I can change it (hoping my test still pass).  Did you look at the test?
> About the return value from operator=, I would suggest retaining the void as-is as well.

> Maybe we should investigate if it is possible to have the functions in SymbolTable that increment Symbol*'s return a TempNewSymbol( or some other "smartptr" concept) instead of a direct Symbol*?

When changing symbolOops symbolHandles, to Symbol* with refcounting I had an implementation using SymbolHandle as a smart pointer.  I think I had the assignment operator transfer ownership of the pointer, iirc.  I had added reference counting statistics at that time (which I have 
output in my RFR).   My memory was that the number of Atomic:: add and 
subtract operations were an order of magnitude larger.  I saved a lot of email on this except the numbers.  Symbols mostly come into the JVM through the constant pool, with these exceptions that use TempNewSymbol.

> In that case we could make the type conversion constructor private and make SymbolTable a friend of TempNewSymbol - this would avoid accidental wrapping of arbitrary Symbol*'s. I will look into the possibility of this.

This sounds like a good idea.

> Thanks again for your clarification
> Markus
> -----Original Message-----
> From: Kim Barrett
> Sent: den 12 januari 2016 05:22
> To: Coleen Phillimore
> Cc: hotspot-runtime-dev
> Subject: Re: RFR (xs) 8145940: TempNewSymbol should have correct copy 
> and assignment functions
> On Jan 11, 2016, at 8:34 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>> Kim and Lois,
>> Thanks for looking at this.   This is my latest webrev with the ExecuteInternalVMTest in it:
>> http://cr.openjdk.java.net/~coleenp/8145940.02/webrev/index.html
>> See comments below.
>> On 1/11/16 6:45 PM, Kim Barrett wrote:
>>> On Jan 7, 2016, at 3:02 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
>>>> Summary: Add clear() to the assignment operator and add copy constructor.
>>>> Ran all jtreg, colocated and non-colocated tests.   RunThese -jck with PrintSymbolTableSizeHistogram statistics:
>>>>   Percent removed          1.35
>>>>   Reference counts          194583
>>>> clean:
>>>>   Percent removed          1.53
>>>>   Reference counts          194245
>>>> Without a reference counting copy constructor, we could remove a TempNewSymbol's Symbol if a GC happens.  Consider:
>>>> TempNewSymbol ts = SymbolTable::new_symbol("abc"); // Hit GC
>>>> The ref count for "abc" is 1 when created by new_symbol, and the destructor could run after the copy into ts, decrementing the reference count to 0 again.  GC could unlink that symbol from the symbol table.  Fortunately, we haven't seen this bug.
>>>> open webrev at http://cr.openjdk.java.net/~coleenp/8145940/
>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8145940
>>>> Thanks,
>>>> Coleen
>>> For the assignment operator, I think better is:
>>> void operator=(TempNewSymbol s) {
>>>    Symbol* tmp = s._temp;
>>>    s._temp = _temp;
>>>    _temp = tmp;
>>> }
>>> This is the well-known copy and swap idiom, with a by-hand "swap".
>>> http://stackoverflow.com/questions/3279543/what-is-the-copy-and-swap
>>> -
>>> idiom
>> Very tricky.  Lois and I figured out the trick.  It doesn't elide any Atomic::* reference counting though, which is what is worrysome for performance, just hides it in the destructor of 's'.  I don't want to change this.
> Copy-swap saves several refcount manipulations in some (common) cases, by taking advantage of rvalue copy elision.
> Coleen's:
> void operator=(const TempNewSymbol& s) {
>    clear();
>    _temp = s._temp;
>    if (_temp != NULL) _temp->increment_refcount(); }
> Markus's:
> const TempNewSymbol& operator=(const TempNewSymbol& rhs) {
>    if (this != &rhs && _temp != rhs._temp) {
>      clear();
>      _temp = rhs._temp;
>      if (_temp != NULL) {
>        _temp->increment_refcount();
>      }
>    }
>    return *this;
> }
> Kim's:
> void operator=(TempNewSymbol rhs) {
>    Symbol* tmp = rhs._temp;
>    rhs._temp = _temp;
>    _temp = tmp;
> }
> [I'm ignoring differences in the return value in the discussion below.
> I think it ought to be either "TempNewSymbol&" (not const) or void.]
> Markus's is mostly the same as Coleen's, except it handles a couple of special cases differently.
> Markus's handles self-assignment, where Coleen's doesn't. For correctness that's important; in actual practice self-assignment is generally rare to non-existent, so always paying overhead for it is unpleasant.
> Markus's also specially handles re-assignment to the same symbol, avoiding any refcount changes in that case.  I suggest this is another rare case, and adding overhead in the normal case to get that rare benefit is not the right tradeoff.
> Assuming we're not in one of those special cases and neither Symbol* is NULL, Coleen/Markus decrements the old and increments the new. And in an important special case where rhs is an rvalue, there will be another decrement as the temporary goes out of scope. So two refcount changes when rhs is an lvalue, three when it's an rvalue.
> In contrast, Kim's does an increment and a decrement when rhs is an lvalue, and *only* a decrement when rhs is an rvalue and rvalue copy elision is performed by the compiler, which any decent optimizing compiler will do.
> The rhs is an rvalue in the following situation:
>    TempNewSymbol tns = ... ;
>    ...
>    tns = expr_returning_Symbol* ;
> The rhs gets implicitly converted to a temporary (rvalue) TempNewSymbol referring to the symbol (no refcount change), that temporary is directly bound to the assignment argument (copy elided), swap the symbols, and the rvalue converted temp goes out of scope and decrements the refcount of the old Symbol*.
> The copy-swap version also handles the (rare) self-assignment and re-assignment to the same symbol cases. In these cases Kim's still does the same refcounting changes as otherwise.  But it's better to do extra work in the rare cases and keep the normal cases fast.
> This is why copy-swap has become the "usual idiom"for assignment operators. There might be cases where it's not the best answer, but usually it is. And this is why (to answer Lois's issue) one should use it even though it's not obvious if one is not previously familiar with the idiom; one should instead learn the idiom, and then one spends time looking hard at implementations that do something else.
> Of course, this may all be entirely in the noise; I hope that TempNewSymbol assignment is not a high frequency operation.
>>> I think the updated description of TempNewSymbol isn't really right.
>>> Because the conversion constructor doesn't increment the reference 
>>> count, it must *not* be used to capture some arbitrary reference to 
>>> a Symbol*. Only a new symbol or one obtained from one of the lookup 
>>> functions really should be managed by this class.
>> Yes, this is a subtlety of the design and one that I can't really fix right now.  I thought of changing TempNewSymbol to convert from const char * but there are cases where we don't want to add a symbol.
>> There is a comment above TempNewSymbol.  I don't know what to add to it.   And Markus's comments are really good.
> I was confused about which webrev I was looking at.  Coleen's doesn't seem to change those comments.
> It was (parts of) Markus's comments that worried me.  And the whole design is tricky and risky.

More information about the hotspot-runtime-dev mailing list