[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support
adinn at redhat.com
Mon Aug 10 08:43:34 UTC 2020
On 10/08/2020 05:45, Pengfei Li wrote:
>> I also eyeballed /some/ of the generated code to check that it
>> looked ok. I'd really like to be able to do that systematically for
>> a comprehensive test suite that exercised every rule but I only had
>> the machine for a few days. This really ought to be done as a
>> follow-up to ensure that all the rules are working as expected.
> Not sure if you have tried my newly added test in the vectorization
> folder. It checks if expected SVE/NEON instructions are generated as
> expected for each C2 vectornode by checking the OptoAssembly output.
> I put it in another webrev so you may have missed it.
Ah, thank you. That was not in the patch I Ningsheng pointed me at. It
is exactly what is needed to check the generation rules are all working.
>> Just out of interest why does UseSVE have range(0,2)? It seems you
>> are only testing for UseSVE > 0. Does value 2 correspond to an
>> optional subset?
> AArch64 SVE has multiple versions. Current Fujitsu FX machine
> supports SVE1 only. We leave 2 here for SVE2 support in the near
Ah ok, thanks. Got it. Being able to switch on level 1 without level 2
is a good idea.
>> Why have you added a reg_def for R8 and R9 here and also to
>> alloc_class chunk0 at lines 544-545? They aren't used by C2 so why
>> define them?
> This has no functionality change to the two scratch registers. But if
> these are missing in the register definition, the regmask for vector
> registers won't start at an aligned position. So we prefer adding
> them back to make the computation easier.
It would be good to make this clear with a comment. Also, I think you
should change the name of the registers to R8_UNUSED and R9_UNUSED just
to emphasize that these are not expected to be included in any register
>> prf sets a predicate register field. pgrf sets a governing
>> predicate register field. Should the name not be gprf.
> I guess the reason is that the ArmARM doc says "the Pg field".
Ok, let's leave it at that then and blame ARM ;-)
>> The comment is rather oddly formatted.
> Thanks for catching this.
Well, that's what reviews are for ...
>> At line 650 you guard the assert with a test for lrg._is_vector. Is
>> that not always going to be guaranteed by the outer condition
>> lrg._is_scalable? If so then you should really assert
>> . . .
> Thanks for above suggestions. We will consider refactoring these
Ok, I'll wait for an updated webrev.
>> Andrew Haley is definitely going to ask you to update function
>> entry (assembler_aarch64.cpp:76) to call these new instruction
>> generation methods and then validate the generated code using
>> asm_check So, I guess you might as well do that now ;-)
> Thanks for letting us know. We will check how to validate those.
>> Can you explain why we need to check p7 here and not do so in other
>> places where we call into the JVM? I'm not saying this is wrong. I
>> just want to know how you decided where re-init of p7 was needed.
> Sorry I don't know how the places are decided. But I will ask
> Ningsheng to explain this question and reply you later.
>> Does this mean that is someone sets the maximum vector size to a
>> non- power of two, such as 384, all superword operations will be
>> bypassed? Including those which can be done using NEON vectors?
> The existing SLP doesn't support non-power-of-2 vector size (there
> are some assertions inside) so we added this. Yes, it's better if we
> have some mechanism to fall back to NEON for non-power-of-2 size. But
> so far in practice, we don't know any real chip implements the
> non-power-of-2 vector size. Also, we are now working on a new
> predicate-driven auto-vectorization pass to support SVE better. Do
> you think it's ok if we print some warnings if someone sets a
> non-power-of-2 size in vm options? Or any other suggestions in the
> short term?
Well, the test for MaxVectorSize in vm_version.cpp currently only
ensures it has been set to a multiple of 16. I think you probably ought
to check for a power of two at that point and exit the VM otherwise. If
hardware comes along that supports a non-power of two we can deal with
it at that point.
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill
More information about the aarch64-port-dev