[9] RFR(S): 8075136: Unnecessary sign extension for byte array access

Tobias Hartmann tobias.hartmann at oracle.com
Wed Mar 18 09:51:21 UTC 2015

Hi Andrew,

thanks a lot for clarifying this. Please see comments inline.

> On 17/03/15 16:11, Andrew Dinn wrote:
>> Apologies for jumping the gun wiht my previous comment. Having reviewed
>> the earlier traffic in this thread I now understand why the two new
>> rules are needed (i.e. to deal with the case where there is no LShift in
>> the matched operand).

Actually, the original intention of this change was to get rid of unnecessary sign extensions if the ConvI2LNode is not needed. For example, on x86 we avoid a sign extension if idx is guaranteed to be >= 0. See 'indPosIndexScaleOffset' in x86_64.ad:

  operand indPosIndexScaleOffset(any_RegP reg, immL32 off, rRegI idx, immI2 scale)
    predicate(n->in(2)->in(3)->in(1)->as_Type()->type()->is_long()->_lo >= 0);


On x86 this optimization was missing for the unscaled case (for example, a byte array access). That's why I added this case to the arm64 code.

I assumed that we do the same optimization on arm64 but now it seems to me that we don't. See the following output with the 'baseline' version:

Byte array access:
  0x0000007f7453d8b4: add	x10, x10, w1, sxtw
  0x0000007f7453d8b8: ldrsb	w0, [x10,#16]

Char array access:
  0x0000007f7cb6f8b4: add	x8, x10, #0x10
  0x0000007f7cb6f8b8: ldrh	w0, [x8,w1,sxtw #1]

In both cases we have a 'sxtw' which is not needed because we have range checks that guarantee that index (w1) is always >= 0.

> The two new rules both specify an unscaled 32-bit integer register as
> index and, of course, the scale value is installed as 0 in the rule. So,
> the correct patch to function storeLoad is to include these two switch
> cases in the set which specify a sign-extend:
>     . . .
>     switch (opcode) {
>       scale = Address::sxtw(size);
>       break;
>     default:
>       scale = Address::lsl(size);
>     . . .

Adapting the code as you suggested has no effect on the generated assembly code. It is equal to the baseline version. In fact, the indIndexOffsetI2L operand is not used.

  0x0000007fa91c06b4: add	x10, x10, w1, sxtw
  0x0000007fa91c06b8: ldrsb	w0, [x10,#16]


I assume that the changes are not required, since we don't do the 'unnecessary sign extension optimization'. What do you think?


More information about the hotspot-compiler-dev mailing list