Vector API: blend() performance on AArch64
ningsheng.jian at arm.com
Tue Mar 9 05:59:14 UTC 2021
Thanks for trying Vector API on AArch64. I see you were using
IntVector.SPECIES_256 species in your benchmarks. However, on AArch64
NEON, the max hardware vector size is 128 bits. So for 256-bits, we are
not able to intrinsify to use SIMD directly, which will fall back to
Java implementation of those APIs, blend() for example. You can use
-XX:+PrintIntrinsics option to see some details.
For the benchmarks, I would suggest to write in a more (performance)
portable way, e.g. use IntVector.SPECIES_PREFERRED and do not assume the
actual vector length in code logic.
On 3/9/21 4:25 AM, Gunnar Morling wrote:
> I was exploring the Vector API a bit  and noticed that the performance
> of my vectorized FizzBuzz information is pretty poor on AArch64. I first
> thought this may be specific to the Apple M1 chip on which I was running
> this; but numbers don't look better with Linux (AWS Graviton2, see the repo
>  for all numbers) either. My implementation is using the blend() API
> method, is this not (yet) supported on AArch64 perhaps?
> Thanks for any hints,
>  https://www.morling.dev/blog/fizzbuzz-simd-style/
>  https://github.com/gunnarmorling/simd-fizzbuzz
More information about the panama-dev