[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

Ningsheng Jian ningsheng.jian at arm.com
Wed Aug 19 09:53:45 UTC 2020

Hi Andrew,

I have updated the patch based on the review comments. Would you mind 
taking another look? Thanks!



Also add build-dev, as there's a makefile change.

And the split parts:

1) SVE feature detection:

2) c2 register allocation:

3) SVE c2 backend:

Bug: https://bugs.openjdk.java.net/browse/JDK-8231441
CSR: https://bugs.openjdk.java.net/browse/JDK-8248742

JTreg tests are still running, and so far no new failure found.


On 8/17/20 5:16 PM, Andrew Dinn wrote:
> Hi Pengfei,
> On 17/08/2020 07:00, Ningsheng Jian wrote:
>> Thanks a lot for the review! Sorry for the late reply, as I was on
>> vacation last week. And thanks to Pengfei and Joshua for helping
>> clarifying some details in the patch.
> Yes, they did a very good job of answering most of the pending questions.
>>> I also eyeballed /some/ of the generated code to check that it looked
>>> ok. I'd really like to be able to do that systematically for a
>>> comprehensive test suite that exercised every rule but I only had the
>>> machine for a few days. This really ought to be done as a follow-up to
>>> ensure that all the rules are working as expected.
>> Yes, we would expect Pengfei's OptoAssembly check patch can get merged
>> in future.
> I'm fine with that as a follow-up patch if you raise a JIRA for it.
>>> I am not clear why you are choosing to re-init ptrue after certain JVM
>>> runtime calls (e.g. when Z calls into the runtime) and not others e.g.
>>> when we call a JVM_ENTRY. Could you explain the rationale you have
>>> followed here?
>> We do the re-init at any possible return points to c2 code, not in any
>> runtime c++ functions, which will reduce the re-init calls.
>> Actually I found those entries by some hack of jvm. In the hacky code
>> below we use gcc option -finstrument-functions to build hotspot. With
>> this option, each C/C++ function entry/exit will call the instrument
>> functions we defined. In instrument functions, we clobber p7 (or other
>> reg for test) register, and in c2 function return we verify that p7 (or
>> other reg) has been reinitialized.
>> http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
> Nice work. It's very good to have that documented. I'm willing to accept
> i) that this has found all current cases and ii) that the verify will
> catch any cases that might get introduced by future changes (e.g. the
> callout introduced by ZGC that you mention below). As the above mot say
> there is a slim chance this might have missed some cases but I think it
> is pretty unlikely.
>>> Specific Comments (register allocator webrev):
>>> aarch64.ad:97-100
>>> Why have you added a reg_def for R8 and R9 here and also to alloc_class
>>> chunk0 at lines 544-545? They aren't used by C2 so why define them?
>> I think Pengfei has helped to explain that. I will either add clear
>> comments or rename the register name as you suggested.
> Ok, good.
>> As Joshua clarified, we are also working on predicate scalable reg,
>> which is not in this patch. Thanks for the suggestion, I will try to
>> refactor this a bit.
> Ok, I'll wait for an updated patch. Are you planning to include the
> scalable predicate reg code as part of this patch? I think that would be
> better as it would help to clarify the need to distinguish vector regs
> as a subset of scalable regs.
>>> zBarrierSetAssembler_aarch64.cpp:434
>>> Can you explain why we need to check p7 here and not do so in other
>>> places where we call into the JVM? I'm not saying this is wrong. I just
>>> want to know how you decided where re-init of p7 was needed.
>> Actually I found this by my hack patch above while running jtreg tests.
>> The stub slowpath here can be a c++ function.
> Yes, good catch.
>>> superword.cpp:97
>>> Does this mean that is someone sets the maximum vector size to a
>>> non-power of two, such as 384, all superword operations will be
>>> bypassed? Including those which can be done using NEON vectors?
>> Current SLP vectorizer only supports power-of-2 vector size. We are
>> trying to work out a new vectorizer to support all SVE vector sizes, so
>> we would expect a size like 384 could go to that path. I tried current
>> patch on a 512-bit SVE hardware which does not support 384-bit:
>> $ java -XX:MaxVectorSize=16 -version # (32 and 64 are the same)
>> openjdk version "16-internal" 2021-03-16
>> $ java -XX:MaxVectorSize=48 -version
>> OpenJDK 64-Bit Server VM warning: Current system only supports max SVE
>> vector length 32. Set MaxVectorSize to 32
>> (Fallbacks to 32 and issue a warning, as the prctl() call returns 32
>> instead of unsupported 48:
>> https://www.kernel.org/doc/Documentation/arm64/sve.txt)
>> Do you think we need to exit vm instead of warning and fallbacking to 32
>> here?
> Yes, I think a vm exit would probably be a better choice.
> regards,
> Andrew Dinn
> -----------
> Red Hat Distinguished Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill

More information about the aarch64-port-dev mailing list