RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays
dmitrij.pochepko at bell-sw.com
Mon Oct 30 15:42:35 UTC 2017
as part of JEP “Improve performance of String and Array operations on
AArch64” I wanted to send out a pre-review for some of the improved
intrinsics to get early feedback. This is the first in a row.
Please pre-review patch for 8187472 - “AARCH64: array_equals intrinsic
doesn't use prefetch for large arrays” which improves large array
handling (small arrays are unaffected).
In short, this patch uses large (64 byte) loop with prefetch instruction
to handle large arrays, which is done in a stub. I can observe
performance boost on systems without h/w prefetcher up to x6. System
with hardware prefetching (Cortex A53 and some very modern ones) also
benefit from this patch (15% improvement).
I've tried a number of different versions (attached to JDK-8187472) with
different load instructions (ldr/ldp/<simd>), slightly different code
shapes, different data dependencies across registers, alignments, e.t.c.
Version presented in webrev (version 2.6d from JDK-8187472 attachments)
is the simplest from the fast ones (as measured on 3 systems available
I've used this simple benchmark to measure performance:
Chart for ThunderX:
Chart for Cortex A53(R-Pi):
Raw numbers for ThunderX:
Raw numbers for R-Pi:
Testing: I've run existing jtreg test
(java/util/Arrays/ArraysEqCmpTest.java) in both Xmixed and Xcomp and
found no regressions.
Any additional numbers on other systems are welcome, as well as early
feedback on the code.
More information about the hotspot-compiler-dev