RFR(S): 8075324: Costs of memory operands in aarch64.ad are inconsistent
tobias.hartmann at oracle.com
Wed Mar 18 10:46:55 UTC 2015
thanks a lot for the detailed explanation! That makes sense to me.
I was confused by the fact that the generation of the additional add in the case of a scaled-indirect-with-offset, is done 'implicitly' in loadStore and therefore we have to account for that by setting a higher cost for the corresponding memory operand.
Do you think we should still fix the cost of 'indOffI'? If so, here is the corresponding webrev:
On 17.03.2015 16:15, Andrew Dinn wrote:
> On 17/03/15 12:42, Tobias Hartmann wrote:
>> please review the following patch.
> I agree one of these costs is wrong (indOffI) but otherwise I think the
> costs are correct. That also means I think your addition of the two new
> operands is a mistake. I suspect you may have been confused by the
> rather unfortunate choice of names for the operands -- which are rather
> misleading. Also, the way the instruction generation is hidden behind
> some parameterised functions (ref below) makes it hard to see what is
> happening here. In particular, it makes it look like your operands are
> doing something that they cannot actually do.
> Many of the operand definitions are supposed to have cost 0 because the
> matched operands can be mapped directly to register arguments for a
> single ldr/str insn (of the appropriate flavour). These are the cases where
> i) there is no indirect register and an offset in a relevant range
> or where
> ii) where there is an indirect register and zero offset
> If the matcher can match an operand from these cases then the cost of
> the instruction combined with the complexity of the reduced operand
> (added in automatically by the matching algorithm) should be all that is
> needed to drive the selection process.
> However, in the operand definitions where
> iii) there is an indirect register and non-zero offset
> encoding of the load/store requires planting an lea (which manifests as
> an add) to add the offset to the base register followed by a ldr/str
> which does the scaled indirect load. That's because AArch64 does not
> provide a scaled indirect with offset address mode.
> For these cases the cost of the operand should be INSN_COST in order to
> account precisely for that extra add and the zero cost of passing the
> resulting offsetted address into the following ldr/str which does the
> scaled indirect load. There is no need to prefer this sequence. If two
> separate instructions can do this cheaper or at the same cost then that
> is fine.
> n.b. your newly provided operand format definition which prints
> ldrsbw R0, R10, R1, #16 I2L # byte
> exemplifies the problem here. There is no instruction which does a
> scaled indirect load with offset in the AArch64 instruction. If you take
> look at the static methods named loadStore defined in the source block
> you can see how this instruction will actually be generated.
> PrintAssembly will reveal that it gets translated as follows
> add R8, R10, #16
> ldrsbw R0, R1 sxtw
> Let's look at some cases:
> You are right that indOffI and indOffL should have the same cost but
> that cost should be 0 because they can both be encoded with a single
> ldr/str using only a base register and offset with no indirect register.
> ldr(dst, base, disp)
> By constrast, indIndexScaledI2L is correct to have cost 0. It has a
> scaled index register but offset 0. So, it can be encoded using a single
> ldr/str taking a scaled indirect register as argument.
> ldr(dst, base, index, SXTW)
> Similarly, indIndexScaledOffsetL should have cost INSN_COST because it
> will cause planting of an add followed by some flavour of ldr/str.
> add(rscratch1, base, disp)
> ldr(dst, index, LSL(0))
> So, the costs should be as follows
> Cost 0:
> Cost INSN_COST:
> Your two extra operands are not needed.
> Andrew Dinn
More information about the hotspot-compiler-dev