RFR(S): 8075324: Costs of memory operands in aarch64.ad are inconsistent
adinn at redhat.com
Tue Mar 17 15:15:03 UTC 2015
On 17/03/15 12:42, Tobias Hartmann wrote:
> please review the following patch.
I agree one of these costs is wrong (indOffI) but otherwise I think the
costs are correct. That also means I think your addition of the two new
operands is a mistake. I suspect you may have been confused by the
rather unfortunate choice of names for the operands -- which are rather
misleading. Also, the way the instruction generation is hidden behind
some parameterised functions (ref below) makes it hard to see what is
happening here. In particular, it makes it look like your operands are
doing something that they cannot actually do.
Many of the operand definitions are supposed to have cost 0 because the
matched operands can be mapped directly to register arguments for a
single ldr/str insn (of the appropriate flavour). These are the cases where
i) there is no indirect register and an offset in a relevant range
ii) where there is an indirect register and zero offset
If the matcher can match an operand from these cases then the cost of
the instruction combined with the complexity of the reduced operand
(added in automatically by the matching algorithm) should be all that is
needed to drive the selection process.
However, in the operand definitions where
iii) there is an indirect register and non-zero offset
encoding of the load/store requires planting an lea (which manifests as
an add) to add the offset to the base register followed by a ldr/str
which does the scaled indirect load. That's because AArch64 does not
provide a scaled indirect with offset address mode.
For these cases the cost of the operand should be INSN_COST in order to
account precisely for that extra add and the zero cost of passing the
resulting offsetted address into the following ldr/str which does the
scaled indirect load. There is no need to prefer this sequence. If two
separate instructions can do this cheaper or at the same cost then that
n.b. your newly provided operand format definition which prints
ldrsbw R0, R10, R1, #16 I2L # byte
exemplifies the problem here. There is no instruction which does a
scaled indirect load with offset in the AArch64 instruction. If you take
look at the static methods named loadStore defined in the source block
you can see how this instruction will actually be generated.
PrintAssembly will reveal that it gets translated as follows
add R8, R10, #16
ldrsbw R0, R1 sxtw
Let's look at some cases:
You are right that indOffI and indOffL should have the same cost but
that cost should be 0 because they can both be encoded with a single
ldr/str using only a base register and offset with no indirect register.
ldr(dst, base, disp)
By constrast, indIndexScaledI2L is correct to have cost 0. It has a
scaled index register but offset 0. So, it can be encoded using a single
ldr/str taking a scaled indirect register as argument.
ldr(dst, base, index, SXTW)
Similarly, indIndexScaledOffsetL should have cost INSN_COST because it
will cause planting of an add followed by some flavour of ldr/str.
add(rscratch1, base, disp)
ldr(dst, index, LSL(0))
So, the costs should be as follows
Your two extra operands are not needed.
More information about the hotspot-compiler-dev