RFR: 8187472 - AARCH64: array_equals intrinsic doesn't use prefetch for large arrays
dmitrij.pochepko at bell-sw.com
Wed Feb 7 19:39:58 UTC 2018
please review patch for JDK-8187472 - AARCH64: array_equals intrinsic
doesn't use prefetch for large arrays
this patch adds logic with prefetch instruction usage, which is helping
most of CPUs. It also introduce 2 ways of loading data: via loading into
general-purpose registers and via SIMD registers(controlled by newly
introduced vm key: UseSIMDForArrayEquals.
I've measured performance on few CPUs (ThunderX, ThunderX2(not yet
released), Cortex A53, Cortex A73) and arranged defaults based on
measurement results(so, SIMD is default for everything except ThunderX
and A53. Also, prefetch instruction slows down A73 - I've disabled it
In general, this patch changes very short arrays handling(performing
8-byte read instead of few smaller reads, using the fact of 8-byte
alignment) and jumping into stub with large 64-byte read loop for larger
Measurements(measured array length 7,64,128,256,512,1024,100000.
Improvement in %. 80% improvement means that new version is 80% faster,
i.e. 5 times.):
ThunderX: 2%, -4%, 0%, 2%, 32%, 55%, 80%
ThunderX2: 0%, -3%, 17%, 19%, 29%, 31%, 47%
Cortex A53 at 533MHz: 8%, -1%, -2%, 4%, 6%, 5%, 3%
Cortex A73 at 903MHz: 8%, -3%, 0%, 7%, 8%, 9%, 8%
Note: medium sizes are a bit slower because of additional branch
added(which checks size and jumps to stub).
Note2: Cortex CPUs were downclocked to have stable frequency because of
overheat on device I was using(frequency was randomly jumping producing
unstable results otherwise), so, results might be a bit different on
devices with higher frequency.
Testing: I've run all hotspot jtreg tests with and without patch on
aarch64 and found no new failures
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-compiler-dev