[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support

Andrew Dinn adinn at redhat.com
Fri Aug 7 13:25:09 UTC 2020

Hi Ningsheng,

On 31/07/2020 02:41, Ningsheng Jian wrote:
> Hi Andrew,
> Thanks a lot!!
> FYI, the latest patch:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html
> And some descriptions:
> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt

Thanks for doing such a great job. This is very good work.

Also, thanks for splitting the patch up to separate out the different
steps -- that was immensely helpful.

I have one general query and a small number of detailed comments which
are provided separately for each patch. See below.


I was able to test this patch on a loaned Fujitsu FX700. I replicated
your results, passing tier1 tests and the jtreg compiler tests in
vectorization, codegen, c2/cr6340864 and loopopts.

I also eyeballed /some/ of the generated code to check that it looked
ok. I'd really like to be able to do that systematically for a
comprehensive test suite that exercised every rule but I only had the
machine for a few days. This really ought to be done as a follow-up to
ensure that all the rules are working as expected.

General Comments:

Sizing the NEON registers using 8 slots -- even though there might
actually be more (or less!) slots in use for a VecA is fine. However, I
think this needs a little bit more explanation in the .ad. file (see
comments on ra webrev below)

I'm ok with your choice to use p7 as an always true predicate register
and also how you choose to init and re-init from code defined via the ad
file based on C->max_vector_size().

I am not clear why you are choosing to re-init ptrue after certain JVM
runtime calls (e.g. when Z calls into the runtime) and not others e.g.
when we call a JVM_ENTRY. Could you explain the rationale you have
followed here?

Specific Comments (feature webrev):


Just out of interest why does UseSVE have range(0,2)? It seems you are
only testing for UseSVE > 0. Does value 2 correspond to  an optional subset?

Specific Comments (register allocator webrev):


Why have you added a reg_def for R8 and R9 here and also to alloc_class
chunk0 at lines 544-545? They aren't used by C2 so why define them?

assembler_aarch64.hpp:280 (also 699)

prf sets a predicate register field. pgrf sets a governing predicate
register field. Should the name not be gprf.


The comment is rather oddly formatted.

At line 650 you guard the assert with a test for lrg._is_vector. Is that
not always going to be guaranteed by the outer condition
lrg._is_scalable? If so then you should really assert lrg._is_vector.

The special case code for computation of num_regs for a vector stack
slot also appears in this file with a slightly different organization in
find_first_set (line 1350) and in PhaseChaitin::Select (line 1590).
There is another similar case in RegMask::num_registers at regmask.cpp:
98. It would be better to factor out the common code into methods of
LRG. Maybe using the following?

  bool LRG::is_scalable_vector() {
    if (_is_scalable) {
      assert(_is_vector == 1);
      assert(_num_regs == == RegMask::SlotsPerVecA)
      return true;
    return false;

  int LRG::scalable_num_regs() {
    if (OptoReg::is_stack(_reg)) {
      return _scalable_reg_slots
    } else {
      return num_reg_slots;


Once again the test for lrg._is_vector should be guaranteed by the outer
test of lrg._is_scalable. Refactoring using the common methods of LRG as
above ought to help.


Use common method code.


Once again you should be able to use common method code of LRG here.


Once again you should be able to use common method code of LRG here.

Specific Comments (c2 webrev):


very nice defensive check!

assembler_aarch64.hpp:2469 & 2699+

Andrew Haley is definitely going to ask you to update function entry
(assembler_aarch64.cpp:76) to call these new instruction generation
methods and then validate the generated code using asm_check So, I guess
you might as well do that now ;-)


Can you explain why we need to check p7 here and not do so in other
places where we call into the JVM? I'm not saying this is wrong. I just
want to know how you decided where re-init of p7 was needed.


Does this mean that is someone sets the maximum vector size to a
non-power of two, such as 384, all superword operations will be
bypassed? Including those which can be done using NEON vectors?


Andrew Dinn
Red Hat Distinguished Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill

More information about the aarch64-port-dev mailing list