RFR(L): 8069539: RSA acceleration

Andrew Haley aph at redhat.com
Sat May 16 09:35:02 UTC 2015

There is one other thing I didn't mention: it is possible to turn
Montgomery multiplication into a software pipeline if you have enough

The idea is that the latency of one multiplication is overlapped by
the latency of the load of the operands for the next one, and the
accumulation is done on not on the latest multiplication but the
previous one, so no operation ever stalls the pipeline.  I'm not sure
if x86 has enough registers to do this (it might, just) but AArch64
certainly does.

An out-of-order CPU can to some extent do the instruction reordering
automatically, but you still have to write the code in a way that
makes it possible.


More information about the hotspot-compiler-dev mailing list