[aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support
ningsheng.jian at arm.com
Wed Aug 26 09:31:41 UTC 2020
On 8/25/20 8:12 PM, Vladimir Ivanov wrote:
> So, it's enough to use a single "virtual" slot to model XMM, YMM, and
> ZMM registers all at once unless RA supports packing multiple smaller
> vector values into a single register (separately managing lower and
> upper parts of the register; e.g., YMM = XMM(hi):XMM(lo) ). Though
> currently RA does support it, there are no code which utilizes that and
> no plans to do that in the future.
> I believe the situation on AArch64 with NEON and SVE is similar. (And
> scalable vectors make it harder to support packing in RA.)
> (2) vector width matters only for spills/refills and reg2reg moves.
> Matcher does type capturing, so all vector mach nodes keep precise type
> of the value they produce. On x86 it is heavily used later in code
> emission phase, but RA still relies on ideal registers (Op_VecX et al).
> I don't see why RA can't be migrated from ideal registers to types
> (TypeVect) to determine vector size when performing spilling.
> From aforementioned observations, I conclude there should be a way to
> declare a single ideal vector register (Op_Vec) which represents
> full-width vector supported by the hardware and use captured vector
> types (TypeVect instances) to guide RA and code generation. And that's
> the state where I'd like to see vector support in C2 be moving to.
That may be true. I think we can move forward step-by-step for easy
> Regarding predicate registers, I haven't thought too much about them, so
> I don't have a strong opinion about whether they should be a separate
> entity (Op_RegVMask in your patch) or just treated as a vector of bits
>>> So far, I see 2 main directions for RA work:
>>> (a) support vectors of arbitrary size:
>>> (1) helps push the upper limit on the size (1024-bit)
>>> (2) handle non-power-of-2 sizes
>>> (b) optimize RA implementation for large values
>>> Anything else?
>> Yes, and it's not just vector. SVE predicate register has scalable
>> size (vector_size/8) as well. We also have predicate register
>> allocator support well with proposed approach (not in this patch.).
> Though with AVX512 support predicate register support was left aside, I
> agree that predicate registers should be taken into account from the
> very beginning. (And glad to hear you are already working on supporting
As that's one of the main feature of SVE, we have to do that. :-) With
initial SVE support in, our further work on that could be easier.
> Also, I believe options #1/#2 may be extended to cover predicate
> registers as well without too much effort.
>>> Speaking of (a), in particular, I don't see why possible solution for
>>> it should not supersede vecX et al altogether.
>>> Also, I may be wrong, but I don't see a clear evidence there's a
>>> pressing need to have all of that fixed right from the beginning.
>>> (That's why I put #1 and #2 options on the table.) Starting with
>>> #1/#2 would untie initial SVE support from the exploratory work
>>> needed to choose the most appropriate solution for (a) and (b).
>> Staring from partial SVE register support might be acceptable for
>> initial patch (Andrew may not agree :-)), but I think we may end up
>> with more follow-up work, given that our proposed approach already
>> supports SVE well in terms of (a) and (b). If there's no other
>> solution, would it be possible to use current proposed method? It's
>> not difficult to backout our changes in register allocation part, if
>> we find other better solution to support arbitrary vector/predicate
>> sizes in future, as the patch there is actually not big IMO.
> Unfortunately, temporary solutions usually end up as permanent ones
> since there's much less motivation to replace them (and harder to
> justify the effort) after initial pressure is relieved.
> I'm OK with the proposed patch if we agree it's a stop-the-gap/temporary
> solution to the immediate problems you face with initial SVE support and
> are ready to commit resources into replacing it.
Yes, we will continue to maintain and improve it. Our idea might be Arm
biased :), so we will need collaborations and suggestions from the
> That's why I think it's the right time to discuss general direction,
> work on a plan, and use it to guide the coordinated effort to improve
> vector support in C2.
> Also, considering it a stop-the-gap solution means we should strive for
> the simplest solution and that's another reason I put #1/#2 options on
> the table to consider.
> > [...]
>>> Any new problems/hitting some limitations envisioned when spilling
>>> large number of huge vectors (2048-bit) on stack?
>> I haven't seen any so far.
> Ok, good to know.
> I was curious whether stack representation should also move away from
> 32-bit slots to a more compact representation.
I think that's possible, if we could also have the alignment handled.
More information about the aarch64-port-dev