RFR(XXS) 8249087 Symbol constructor unnecessarily initializes _body[0]

Ioi Lam ioi.lam at oracle.com
Wed Jul 22 17:42:16 UTC 2020

On 7/22/20 10:12 AM, Lois Foltan wrote:
> On 7/22/2020 12:17 AM, David Holmes wrote:
>> Hi Lois,
>> On 22/07/2020 1:06 am, Lois Foltan wrote:
>>> On 7/21/2020 2:24 AM, Ioi Lam wrote:
>>>> On 7/20/20 11:12 PM, Florian Weimer wrote:
>>>>> * Ioi Lam:
>>>>>> Hi please review this very simple fix:
>>>>>> diff -r 19f26d72a8d0 src/hotspot/share/oops/symbol.cpp
>>>>>> --- a/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 14:24:19 
>>>>>> 2020 -0700
>>>>>> +++ b/src/hotspot/share/oops/symbol.cpp    Mon Jul 20 17:11:57 
>>>>>> 2020 -0700
>>>>>> @@ -51,8 +51,11 @@
>>>>>>   Symbol::Symbol(const u1* name, int length, int refcount) {
>>>>>>     _hash_and_refcount = pack_hash_and_refcount((short)os::random(),
>>>>>> refcount);
>>>>>>     _length = length;
>>>>>> -  _body[0] = 0;  // in case length == 0
>>>>>>     memcpy(_body, name, length);
>>>>>> +  // For symbols of length 0 and 1: _body[0] (and _body[1]) are
>>>>>> uninitialized and may
>>>>>> +  // contain random values, which will only be read by
>>>>>> Symbol::identity_hash(),
>>>>>> +  // which would tolerate such randomness. These values never 
>>>>>> change
>>>>>> during the lifetime
>>>>>> +  // of the Symbol.
>>>>>>   }
>>>>> Won't this still trip memory debuggers? Symbol::identity_hash() 
>>>>> implies
>>>>> that the result is eventually used in a conditional operation (a hash
>>>>> comparison perhaps).  If it's possible one day to run Hotspot under
>>>>> valgrind, this would result in false positives.
>>>> Are you saying that valgrind will modify uninitialized memory 
>>>> periodically after the constructor has returned, and thus will 
>>>> cause Symbol::identity_hash() to return a different value?
>>>> Without my patch, _body[1] is uninitialized for Symbols whose 
>>>> length is 0 or 1. We have not heard of any issues related to 
>>>> valgrind and Symbol::identity_hash().
>>>> In fact, looking at the code history, the setting of "_body[0] = 0" 
>>>> in Symbol::Symbol was introduced only recently (Feb 2020):
>>>> http://hg.openjdk.java.net/jdk/jdk/annotate/4a4d185098e2/src/hotspot/share/oops/symbol.cpp#l55 
>>>> I'll check with Lois who added the code to see the reason for doing 
>>>> it.
>>> Hi Ioi,
>>> Reviewing this JBS issue, I have concerns over leaving both _body[0] 
>>> and now even _body[1] uninitialized.  The signature processing 
>>> frequently checks the first character of a Symbol via 
>>> Symbol::char_at(0) to determine what type it is dealing with.  Is 
>>> there a danger that the uninitialized memory actually has a valid 
>>> type indicator in it like an '[' character for example?  The 
>>> signature processing could potentially make wrong assumptions about 
>>> the type it is trying to process.
>> Aren't all the signature related symbols already guaranteed to not 
>> have zero-length, or else the length is being pre-tested for zero?
> Hi David,
> I believe you are correct in that signature related symbols are 
> already guaranteed to not have zero-length.  I tried several different 
> jasm files to introduce an empty string either for a class name or for 
> a type of a field reference and did not get past class file parsing 
> without a ClassFormatError.
> Error: LinkageError occurred while loading main class HelloEmptyString
>         java.lang.ClassFormatError: Class name is empty or contains 
> illegal character in descriptor in class file HelloEmptyString
> However, after reading the original issue in JDK-8249087, the 
> objection wasn't that there was a needless initialization of _body[0] 
> but instead it was around the inconsistency of initializing _body[0] 
> and not _body[1].  I think it should be the responsibility of the 
> Symbol class API to ensure initialization so that as a consumer I 
> don't have to worry about _body[0] or _body[1]'s validity.  So I 
> prefer the change to actually add the initialization of _body[1].  
> Something like, "_body[0] = _body[1] = 0".
Hi Lois,

The fact that _body[0..1] is in the Symbol header is just by coincidence 
-- we need only 6 bytes of meta-info about the Symbol, so we have 2 
bytes left over. We use these two left-over bytes for 
Symbol::identity_hash(). However, no one else should unconditionally 
read these 2 bytes --  if we change the Symbol header in the future, 
these two bytes may not be allocated anymore

It looks like leaving _body[0..1] uninitialized is confusing, and will 
possibly lead to problems with valgrind as pointed out by Florian. How 
about this:

Symbol::Symbol(const u1* name, int length, int refcount) {
   _hash_and_refcount =  pack_hash_and_refcount((short)os::random(), 
   _length = length;
   // _body[0..1] are allocated in the header just by coincidence in the 
   // implementation of Symbol. They are read by identity_hash(), so 
make sure they
   // are initialized.
   // No other code should assume that _body[0..1] are always allocated. 
E.g., do
   // not unconditionally read base()[0] as that will be invalid for an 
empty Symbol.
   _body[0] = _body[1] = 0;
   memcpy(_body, name, length);

I'll also change the bug title to "Always initialize _body[0..1] in 
Symbol constructor"


As I discussed with Lois off-line:

There's Signature code that unconditionally reads _body[0], which would 
assert (but class loading checks for invalid signatures that prevents 
this from happening)

BasicType Signature::basic_type(const Symbol* signature) {
   return basic_type(signature->char_at(0));

char Symbol::char_at(int index) const {
   assert(index >=0 && index < length(), "symbol index overflow");
   return (char)base()[index];

Signature::basic_type() should be fixed to either check for length, 
and/or assert that signature is a valid signature.

- Ioi

More information about the hotspot-runtime-dev mailing list