RSA and Diffie-Hellman performance [Was: RFR(L): 8069539: RSA acceleration]
sandhya.viswanathan at intel.com
Tue Jun 2 21:56:06 UTC 2015
Hi John/Tony, Thanks a lot for your comments and inputs.
Hi Vladimir, The patch for 8069539 has the checks as per John's email. Please advise if the patch looks ok to you for the next steps. Let me know if I need to make any changes.
From: John Rose [mailto:john.r.rose at oracle.com]
Sent: Monday, June 01, 2015 5:51 PM
To: Anthony Scarpino
Cc: Viswanathan, Sandhya; Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net
Subject: Re: RSA and Diffie-Hellman performance [Was: RFR(L): 8069539: RSA acceleration]
The important goal, regarding the checks, is to tightly couple the validity checks to the actual loop, without actually putting the checks into the same method as the loop (which is going to be replaced by assembly code!). There should be one copy of the checks and one copy of of the loop itself. The organization of the source code should clearly co-locate the checks and the loop. If these goals are not met, then future changes to the software could introduce calls to the loop which are not properly guarded by validity checks.
To do this, you need at least two methods. One can be a wrapper for the loop, and can contain the check code (single copy). Or, one method can be just checks; then each call of the loop method needs to be preceded by a call to the check method. Either pattern will work. There may be other ways to do it, also.
For the sake of clarity, I think the validity checks for the intrinsified loop should be called out clearly, which means not mixing them with other validity checks. In the case of 8073108, I'm not sure whether the checks that precede processBlocks are all necessary to the intrinsified loop, or whether some of them are related to the contract of the update method. Putting them in their own method processBlocksChecks would make that more clear and maintainable. It may be that *all* of the check are relevant to the loop, in which case they should be linked more formally to the loop, using a coding pattern that makes it clear. In the code for 8069539, implSquareToLenChecks clearly provides the preconditions for an assembly-coded loop in implSquareToLen to be safely executed.
Having two methods instead of one is almost never a problem. Method call overhead is zero in hot code, since everything inlines.
I know I'm being picky, but I get that way when working hand-compiled assembly code.
On May 28, 2015, at 4:39 PM, Anthony Scarpino <anthony.scarpino at oracle.com> wrote:
> Personally I think it better to not have implSquareToLenChecks() and implMulAddCheck() as separate methods and to have the range check squareToLen and mulAdd. Given these change are about performance, it seems unnecessary to add an extra call to a method.
> While we are changing BigInteger, should a range check for multiplyToLen be added? Or is there a different bug for that?
> On 05/27/2015 06:27 PM, Viswanathan, Sandhya wrote:
>> Hi Tony,
>> Please let us know if you are ok with the changes in BigInteger.java (range checks) in patch from Intel:
>> Per Andrew's email below we could go ahead with this patch and it shouldn't affect his work.
>> Best Regards,
>> -----Original Message-----
>> From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley
>> Sent: Wednesday, May 27, 2015 10:12 AM
>> To: Christian Thalinger
>> Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net
>> Subject: RSA and Diffie-Hellman performance [Was: RFR(L): 8069539: RSA acceleration]
>> An update:
>> I'm still working on this. Following last week's revelations  it
>> seems to me that a faster implementation of (integer) D-H is even more
>> I've spent a couple of days tracking down an extremely odd feature
>> (bug?) in MutableBigInteger which was breaking everything, but I'm
>> past that now. I'm trying to produce an intrinsic implementation of
>> the core modular exponentiation which is as fast as any state-of-the-
>> art implementation while disrupting the common code as little as
>> possible; this is not easy.
>> I hope to have something which is faster on all processors, not just
>> those for which we have hand-coded assembly-language implementations.
>> I don't think that my work should be any impediment to Sadya's patch
>> for squareToLen at http://cr.openjdk.java.net/~kvn/8069539/webrev.01/
>> being committed. It'll still be useful.
>>  Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice
More information about the hotspot-compiler-dev