[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

Andrew Dinn adinn at redhat.com
Mon Aug 10 08:43:34 UTC 2020

Hi Pengfei,

On 10/08/2020 05:45, Pengfei Li wrote:

>> I also eyeballed /some/ of the generated code to check that it
>> looked ok. I'd really like to be able to do that systematically for
>> a comprehensive test suite that exercised every rule but I only had
>> the machine for a few days. This really ought to be done as a
>> follow-up to ensure that all the rules are working as expected.
> Not sure if you have tried my newly added test in the vectorization
> folder. It checks if expected SVE/NEON instructions are generated as
> expected for each C2 vectornode by checking the OptoAssembly output.
> I put it in another webrev so you may have missed it. 
> http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/

Ah, thank you. That was not in the patch I Ningsheng pointed me at. It
is exactly what is needed to check the generation rules are all working.

>> Just out of interest why does UseSVE have range(0,2)? It seems you
>> are only testing for UseSVE > 0. Does value 2 correspond to  an
>> optional subset?

> AArch64 SVE has multiple versions. Current Fujitsu FX machine
> supports SVE1 only. We leave 2 here for SVE2 support in the near
> future. 
> https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-instruction-emulator/resources/tutorials/sve/sve-vs-sve2/introduction-to-sve2

Ah ok, thanks. Got it. Being able to switch on level 1 without level 2
is a good idea.

>> Why have you added a reg_def for R8 and R9 here and also to
>> alloc_class chunk0 at lines 544-545? They aren't used by C2 so why
>> define them?
> This has no functionality change to the two scratch registers. But if
> these are missing in the register definition, the regmask for vector
> registers won't start at an aligned position. So we prefer adding
> them back to make the computation easier.

It would be good to make this clear with a comment. Also, I think you
should change the name of the registers to R8_UNUSED and R9_UNUSED just
to emphasize that these are not expected to be included in any register

>> prf sets a predicate register field. pgrf sets a governing
>> predicate register field. Should the name not be gprf.
> I guess the reason is that the ArmARM doc says "the Pg field".

Ok, let's leave it at that then and blame ARM ;-)

>> chaitin.cpp:648-660
>> The comment is rather oddly formatted.
> Thanks for catching this.

Well, that's what reviews are for ...

>> At line 650 you guard the assert with a test for lrg._is_vector. Is
>> that not always going to be guaranteed by the outer condition
>> lrg._is_scalable? If so then you should really assert
>> lrg._is_vector.
>> . . .

> Thanks for above suggestions. We will consider refactoring these
> parts.

Ok, I'll wait for an updated webrev.

>> Andrew Haley is definitely going to ask you to update function
>> entry (assembler_aarch64.cpp:76) to call these new instruction
>> generation methods and then validate the generated code using
>> asm_check So, I guess you might as well do that now ;-)
> Thanks for letting us know. We will check how to validate those.

Ok, thanks.

>> Can you explain why we need to check p7 here and not do so in other
>> places where we call into the JVM? I'm not saying this is wrong. I
>> just want to know how you decided where re-init of p7 was needed.
> Sorry I don't know how the places are decided. But I will ask
> Ningsheng to explain this question and reply you later.

Sure, thanks.

>> Does this mean that is someone sets the maximum vector size to a
>> non- power of two, such as 384, all superword operations will be
>> bypassed? Including those which can be done using NEON vectors?
> The existing SLP doesn't support non-power-of-2 vector size (there
> are some assertions inside) so we added this. Yes, it's better if we
> have some mechanism to fall back to NEON for non-power-of-2 size. But
> so far in practice, we don't know any real chip implements the
> non-power-of-2 vector size. Also, we are now working on a new
> predicate-driven auto-vectorization pass to support SVE better. Do
> you think it's ok if we print some warnings if someone sets a
> non-power-of-2 size in vm options? Or any other suggestions in the
> short term?

Well, the test for MaxVectorSize in vm_version.cpp currently only
ensures it has been set to a multiple of 16. I think you probably ought
to check for a power of two at that point and exit the VM otherwise. If
hardware comes along that supports a non-power of two we can deal with
it at that point.


Andrew Dinn
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill

More information about the aarch64-port-dev mailing list