RFR 8209138: Symbol constructor uses u1 as the element type of its name argument

David Holmes david.holmes at oracle.com
Tue Oct 2 01:45:52 UTC 2018

Hi Harold,

Thanks for persevering with this!

On 2/10/2018 6:23 AM, Harold David Seigel wrote:
> Hi Coleen,
> Please review this updated webrev:
>     http://cr.openjdk.java.net/~hseigel/bug_8209138.3/webrev/index.html
> It uses u1 for Symbol instead of char.  I also added a "char char_at(int 
> index)" function to class Symbol.  It's for callers that want a char 
> from a Symbol instead of a u1.

I was in two minds about this as we really are comparing the i'th byte 
with a given char value. But I suppose if I read char_at(i) as 
byte_as_char_at(i) then it's just a type-conversion operation. Though I 
am curious what byte_at usages remain after this change?


Can you humour me and please change this comment:

  // Return the i-th char, where i < utf8_length


  // Return the i-th utf byte as a char, where i < utf8_length



    char char_at(int index) const {
      assert(index >=0 && index < length(), "symbol index overflow");
      return base()[index];

I'm surprised this doesn't need a (char) cast to keep the compiler 
happy. Some compilers may need it I think and it doesn't hurt to add it.



There's a comment in:


that needs updating:

  FIXME: among other things, this interface is not sufficient to
  describe fields which are themselves arrays (like Symbol's
  jbyte _body[1]).




accesses the array as jbytes - that may need adjusting.


Otherwise all seems fine.


> Thanks! Harold
> On 9/27/2018 10:03 AM, coleen.phillimore at oracle.com wrote:
>> Hi Harold,  I think we've decided to make u1 the type we carry around 
>> in Symbol instead of char, since that's more accurate to represent 
>> utf8.  Since the goal is to make the types consistent, can you try to 
>> change it all to u1 instead.  Keeping the name get_byte() and see how 
>> bad the casting situation is?
>> thank you!
>> Coleen
>> On 9/27/18 9:17 AM, Harold David Seigel wrote:
>>> Hi David,
>>> Please review this updated webrev:
>>> http://cr.openjdk.java.net/~hseigel/bug_8209138.2/webrev/index.html
>>> It contains additional changes to use char for class Symbol.
>>> Hopefully, your concerns w.r.t. signed and unsigned types were 
>>> resolved by our offline discussions.
>>> Thanks, Harold
>>> On 8/21/2018 2:24 AM, David Holmes wrote:
>>>> On 21/08/2018 3:19 PM, Kim Barrett wrote:
>>>>>> On Aug 20, 2018, at 8:53 PM, David Holmes 
>>>>>> <david.holmes at oracle.com> wrote:
>>>>>> Hi Harold,
>>>>>> On 21/08/2018 12:43 AM, Harold David Seigel wrote:
>>>>>>> Hi,
>>>>>>> Please review this change for bug JDK-8209138.  The fix changes 
>>>>>>> class Symbol in symbol.hpp to use type char instead of types u1 
>>>>>>> and jbyte and renames relevant functions by replacing 'byte' with 
>>>>>>> 'char'.  For example, 'Symbol::byte_at_put()' is now 
>>>>>>> 'Symbol::char_at_put()'.
>>>>>> It's not clear to me exactly what all these things should be. In a 
>>>>>> lot of places we seem to be dealing with UTF8 character 
>>>>>> representations, which to me should be u1 (afterall that is how 
>>>>>> they are specified in the classfile format!) which is just an 
>>>>>> unsigned "byte". But char is signed (in our build).
>>>>> Not sure what’s meant by “in our build” but char is unsigned when 
>>>>> building HotSpot for aarch64 / arm.  See, for example, JDK-8209586.
>>>> Ah! We have always set -fsigned-char for the Oracle ARM and ARM64 
>>>> builds (and previously PPC32), but now I see that, as you say, 
>>>> Aarch64 does not set this (nor PPC64). Interesting and unfortunate. 
>>>> I'm somewhat surprised that we haven't encountered more issues due 
>>>> to this.
>>>> In any case we are still converting from an unsigned type to a 
>>>> signed type in many cases with this change.
>>>> David
>>>> -----
>>>>>> The current change just seems to push out the boundary where we 
>>>>>> convert between things. For example ciSymbol now has a char 
>>>>>> base()** but still has a byte_at() method - should that not know 
>>>>>> be char_at() for consistency? Especially given byte_at() returns 
>>>>>> get_symbol()->char-at()
>>>>>> ** Granted the existing jbyte seems the wrong choice too.
>>>>>> I think we have a lot of type confusion in our code, but it isn't 
>>>>>> clear to me that this particular change is necessarily changing 
>>>>>> things in the right direction.
>>>>>> Cheers,
>>>>>> David
>>>>>>> Open Webrev: 
>>>>>>> http://cr.openjdk.java.net/~hseigel/bug_8209138/webrev/index.html
>>>>>>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8209138
>>>>>>> The change was tested by running Mach5 tiers 1 and 2 tests and 
>>>>>>> builds on Linux-x64, Windows, and Mac OS X, running tiers 3-5 
>>>>>>> tests on Linux-x64, and by running JCK-11 API, Lang and VM tests 
>>>>>>> on Linux-x64.
>>>>>>> Thanks, Harold

More information about the hotspot-runtime-dev mailing list