RFR: 8186915 - AARCH64: Intrinsify squareToLen and mulAdd
aph at redhat.com
Fri Sep 22 08:12:23 UTC 2017
On 21/09/17 19:19, Dmitrij Pochepko wrote:
> thank you for looking into this and trying on APM(I have no access to
> this h/w).
> I've used modified benchmark you've sent and run it on ThunderX and
> implSquareToLen still shows better results than implMultiplyToLen in
> most cases on ThunderX (up to 10% on size=127. results:
For 10%, it's not worth doing, given the risks and that it's not used
by crypto operations when C2-compiled.
> However, since performance difference for APM is more than on
> ThunderX, I think it'll be more logical to return back to your idea
> and call multiplyToLen intrinsic inside squareToLen. Alternative
> solution is to generate different code for APM and ThunderX, but I
> prefer to have single version in case of such relatively small
> difference in performance and it's still much faster than without
> intrinsic at all. What do you think?
Yes. Calling multiplyToLen would be fine.
> fyi: regarding size 200 and 1000 - it's incorrect to measure these
> sizes for squareToLen, because squareToLen is never called for size
> more than 127(I've mentioned it before).
It's not incorrect: it's a test for asymptotic behaviour.
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
More information about the hotspot-compiler-dev