[9] RFR(S): 8158228: C1 incorrectly folds mismatched loads from stable arrays

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Mon Jun 6 11:11:26 UTC 2016

>>> Okay, but in the case of the getChar intrinsic it should be safe to fold the mismatched access without additional checks, right?
>> Yes, it will. But naive Java implementation does the job as good.
>> C2 intrinsic has the following code [1]:
>> bool LibraryCallKit::inline_string_char_access(bool is_store) {
>> ...
>>   // Bail when getChar over constants is requested: constant folding would
>>   // reject folding mismatched char access over byte[]. A normal inlining for getChar
>>   // Java method would constant fold nicely instead.
>>   if (!is_store && value->is_Con() && index->is_Con()) {
>>     return false;
>>   }
>> You can do the same in C1.
> Right, this code was added by JDK-8150180. When bailing out from the intrinsic we still have the chance to constant fold the accesses from the Java implementation.
> However, this seems to only work fine with C2 but not with C1:
> http://cr.openjdk.java.net/~thartmann/8158228/webrev.03/

You could take a different route and issue 2 byte loads instead of 1 
mismatched char load in the intrinsic when byte[] and element index are 

But I don't think that constant folding of chars from String is crucial 
in C1-generated code.

> The generated code looks like this:
>   0x00007f7b19597d12: mov    $0x6d4e01688,%rdx	// Array address
>   0x00007f7b19597d1c: cmpl   $0xd,0xc(%rdx)
>   0x00007f7b19597d23: jbe    0x00007f7b19597ef9
>   0x00007f7b19597d29: movsbl 0x1d(%rdx),%edx    // Byte load
>   0x00007f7b19597d2d: and    $0xff,%edx
>   0x00007f7b19597d33: shl    $0x8,%edx
>   0x00007f7b19597d36: or     $0x67,%edx
>   0x00007f7b19597d39: and    $0xffff,%edx
>   0x00007f7b19597d3f: and    $0xffff,%edx
>   0x00007f7b19597d45: and    $0xffff,%edx
>   0x00007f7b19597d4b: and    $0xffff,%edx
>   0x00007f7b19597d51: mov    %rdx,%rbx
>   0x00007f7b19597d54: cmp    $0x67,%ebx
> C1 is not able to fold the second byte load that is shifted by 8 bits.

> With webrev.02 (using the intrinsic but not constant folding the loads) it looks like this:
>   0x00007fb780714cea: mov    $0x6d4e01688,%rdx
>   0x00007fb780714cf4: movzwl 0x1c(%rdx),%edx
>   0x00007fb780714cf8: and    $0xffff,%edx
>   0x00007fb780714cfe: and    $0xffff,%edx
>   0x00007fb780714d04: mov    %rdx,%rbx
>   0x00007fb780714d07: cmp    $0x67,%ebx
> I would therefore prefer to go with webrev.02. What do you think?

Agree. webrev.02 looks good enough to me.

>> While browsing the code, I noticed the following in j.l.StringUTF16:
>>     @HotSpotIntrinsicCandidate
>>     public static char getChar(byte[] val, int index) {
>>         index <<= 1;
>>         return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
>>                       ((val[index]   & 0xff) << LO_BYTE_SHIFT));
>>     }
>>     public static char charAt(byte[] value, int index) {
>>         if (index < 0 || index >= value.length >> 1) {
>>             throw new StringIndexOutOfBoundsException(index);
>>         }
>>         return getChar(value, index);
>>     }
>> _getCharStringU doesn't do any out-of-bounds checks, but StringUTF16.getChar() is pervasively used across java.lang.
>> Is it intentional? I'd expect to see StringUTF16.getChar() private and users calling StringUTF16.charAt().
> Yes, this is/was intentional. See my explanation here:
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2016-May/023047.html
> "The problem is that some of the intrinsified String methods are heavily used and C2 is not always able to remove the range checks introduced by the public wrapper method. For example, StringUTF16::getChar() is used in places where we implicitly know that the access is in bounds (for example, because the array length is always a multiple of two) but C2 cannot prove this. Therefore, if we want to avoid a regression, we have to call the unchecked version of the intrinsic in some cases."
> I completely agree that we should try to avoid this and I plan to revisit the issue with JDK-8156534.

No problem. Just wanted to double-check.

Also, noticed redundant bounds checks:
     public char charAt(int index) {
         checkIndex(index, count);
         if (isLatin1()) {
             return (char)(value[index] & 0xff);
         return StringUTF16.charAt(value, index);
     public static char charAt(byte[] value, int index) {
         if (index < 0 || index >= value.length >> 1) {
             throw new StringIndexOutOfBoundsException(index);
         return getChar(value, index);

Best regards,
Vladimir Ivanov

More information about the hotspot-compiler-dev mailing list