Integrated: 8264409: AArch64: generate better code for Vector API allTrue

Ningsheng Jian njian at
Fri Apr 2 09:35:29 UTC 2021

On Thu, 1 Apr 2021 07:58:07 GMT, Ningsheng Jian <njian at> wrote:

> In Vector API NEON implementation, we use a vector register to represent vector mask, where an element value of -1 is a true mask and an element value of 0 is a false mask. The allTrue() api is used to check whether all the elements of current mask are set.
> Currently, the AArch64 NEON allTrue implementation looks like:
>   andr  $tmp, T16B $src1, $src2\t# src2 is maskAllTrue
>   notr  $tmp, T16B, $tmp
>   addv  $tmp, T16B, $tmp
>   umov  $dst, $tmp, B, 0
>   cmp   $dst, 0
>   cset  $dst
> where $src2 is a preset all true (-1) constant. We can optimize it to the code sequence like below, to check whether all bits are set:
>   uminv $tmp, T16B, $src1
>   umov  $dst, $tmp, B, 0
>   cmp   $dst, 0xff
>   cset  $dst
> With this codegen improvement, we can see about 8%~70% performance uplift on different machines for Alibaba's Vector API bigdata benchmarks [1][2].
> Tested with tier1 and vector api jtreg tests.
> [1]
> [2]

This pull request has now been integrated.

Changeset: 0935eaa4
Author:    Ningsheng Jian <njian at>
Stats:     409 lines in 5 files changed: 13 ins; 12 del; 384 mod

8264409: AArch64: generate better code for Vector API allTrue

Reviewed-by: adinn, pli



More information about the hotspot-compiler-dev mailing list