RFR: 8155617: aarch64: ClearArray does not use DC ZVA
edward.nevill at gmail.com
Thu Apr 28 13:49:48 UTC 2016
Please review the following webrev
This is the bzero3 version previously discussed on the aarch64 list with the inner DC ZVA outlined. The outlining of the DC ZVA loop made no measurable difference to performance.
I have also tuned the BlockZeroingLowLimit to default to 4 x cache line size rather than always defaulting to 256.
Updated performance charts here:-
The chart show the performance improvement on 3 different partners HW.
The benchmark was the following JHM test provided by Andrew Haley
The charts have been normalised so that the original jdk9 hs-comp tree is shown as 100%. The figures are % of original performance so lower is better. This is done to avoid disclosing absolute performance information on partner's HW.
Orig: Original jdk9 hs-comp
bzero6: jdk9 hs-comp with bzero6
Orig (no prf): Original jdk9 hs-comp (-XX:AllocatePrefetchStyle=0)
bzero6 (no pref): jdk9 hs-comp with bzero6 (-XX:AllocatePrefetchStyle=0)
There is significant interaction between prefetch and block zeroing as discussed previously. Some partners benefit from prefetch, others do not.
The proposed patch does not change the behaviour of prefetch (ie. it leaves it enabled) as I think there should be a separate tuning exercise to tune prefetch for different partners HW.
OK to push?
More information about the hotspot-compiler-dev