[10] RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays

Dmitrij Pochepko dmitrij.pochepko at bell-sw.com
Mon Oct 30 15:42:35 UTC 2017


as part of JEP “Improve performance of String and Array operations on 
AArch64” I wanted to send out a pre-review for some of the improved 
intrinsics to get early feedback. This is the first in a row.

Please pre-review patch for 8187472 - “AARCH64: array_equals intrinsic 
doesn't use prefetch for large arrays” which improves large array 
handling (small arrays are unaffected).

In short, this patch uses large (64 byte) loop with prefetch instruction 
to handle large arrays, which is done in a stub. I can observe 
performance boost on systems without h/w prefetcher up to x6. System 
with hardware prefetching (Cortex A53 and some very modern ones) also 
benefit from this patch (15% improvement).

I've tried a number of different versions (attached to JDK-8187472) with 
different load instructions (ldr/ldp/<simd>), slightly different code 
shapes, different data dependencies across registers, alignments, e.t.c. 
Version presented in webrev (version 2.6d from JDK-8187472 attachments) 
is the simplest from the fast ones (as measured on 3 systems available 
for testing).

I've used this simple benchmark to measure performance: 

Chart for ThunderX: 

Chart for Cortex A53(R-Pi): 

Raw numbers for ThunderX: 

Raw numbers for R-Pi: 

webrev: http://cr.openjdk.java.net/~dpochepk/8187472/webrev.01/

Testing: I've run existing jtreg test 
(java/util/Arrays/ArraysEqCmpTest.java) in both Xmixed and Xcomp and 
found no regressions.

Any additional numbers on other systems are welcome, as well as early 
feedback on the code.



More information about the hotspot-compiler-dev mailing list