[aarch64-port-dev ]  RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays
dmitrij.pochepko at bell-sw.com
Mon Oct 30 18:03:54 UTC 2017
On 30.10.2017 20:30, Andrew Haley wrote:
> On 30/10/17 16:43, Dmitrij Pochepko wrote:
>> I've tried simd loads(even aligned ones to be sure that alignment is not
>> an issue). simd versions were attached into JDK-8187472 as
>> - v5.0(simd loads, 16-byte address alignment, 64 bytes per 1 loop
>> - v7.0(simd loads, 16-byte alignment, 64 bytes per 1 loop iteration)
>> - v9.0(simd loads, 64 byte alignment, 128 bytes per 1 loop iteration).
>> I've measured it on ThunderX and found while best non-simd version
>> handles 1000000 bytes arrays in ~295 microseconds, simd versions had
>> numbers about ~355 microseconds.
> I'm rather reluctant to accept non-SIMD intrinsics because I expect
> SIMD performance to improve, and I expect SIMD to be the future. The
> same is true of implementations which avoid the use of ldp.
I also expected NEON to be faster on very new designs. Since I have a
SIMD version of this intrinsic that I can merge into stub under an if
with new option (like UseSIMDForArrayEquals with default value set to
false, almost the same as existing UseSIMDForMemoryOps, which is used in
array copy intrinsic) if you want, but it is slower for the CPUs we have
access to and likely not going to be the default. This way we'll have a
fast version and a SIMD version.
I am hesitant if it is best to do this, or keep a single, simple, and
fastest version for now for this intrinsic, and get back to it when SVE
becomes widely available.
What do you think?
Note that other intrinsics that are in the works will use SIMD.
More information about the hotspot-compiler-dev