[aarch64-port-dev ]  RFR(S): 8251525: AARCH64: Faster Math.signum(fp)
aph at redhat.com
Tue Aug 25 16:55:38 UTC 2020
On 24/08/2020 22:52, Dmitry Chuyko wrote:
> I added two more intrinsics -- for copySign, they are controlled by
> UseCopySignIntrinsic flag.
> webrev: http://cr.openjdk.java.net/~dchuyko/8251525/webrev.03/
> It also contains 'benchmarks' directory:
> There are 8 benchmarks there: (double | float) x (blackhole | reduce) x
> (current j.l.Math.signum | abs()>0 check).
> My results on Arm are in signum-facgt-copysign.ods. Main case is
> 'random' which is actually a random from positive and negative numbers
> between -0.5 and +0.5.
> Basically we have ~14% improvement in 'reduce' benchmark variant but
> ~20% regression in 'blackhole' variant in case of only copySign()
> Same picture if abs()>0 check is used in signum() (+-5%). This variant
> is included as it shows very good results on x86.
> Intrinsic for signum() gives improvement of main case in both
> 'blackhole' and 'reduce' variants of benchmark: 28% and 11%, which is a
> noticeable difference.
Ignoring Blackhole for the moment, this is what I'm seeing for the
Benchmark Mode Cnt Score Error Units
DoubleReduceBench.ofRandom avgt 3 2.456 ± 0.065 ns/op
DoubleReduceBench.ofRandom avgt 3 2.766 ± 0.107 ns/op
DoubleReduceBench.ofRandom avgt 3 2.537 ± 0.770 ns/op
Neoverse N1 (Actually Amazon m6g.16xlarge):
DoubleReduceBench.ofRandom avgt 3 1.173 ± 0.001 ns/op
DoubleReduceBench.ofRandom avgt 3 1.043 ± 0.022 ns/op
DoubleReduceBench.ofRandom avgt 3 1.012 ± 0.001 ns/op
By your own numbers, in the reduce benchmark the signum intrinsic is
worse than default for all 0 and NaN, but about 12% better for random,
>0, and <0. If you take the average of the sppedups and slowdowns it's
actually worse than default.
By my reckoning, if you take all possibilities (Nan, <0, >0, 0,
Random) into account, the best-performing on the reduce test is
actually Abs/Copysign, but there's very little in it. The only time
that the signum intrinsic actually wins is when you're storing the
result into memory *and* flushing the store buffer.
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the aarch64-port-dev