RSA and Diffie-Hellman performance [Was: RFR(L): 8069539: RSA acceleration]

Vladimir Kozlov vladimir.kozlov at
Wed Jun 3 22:10:38 UTC 2015

There were several problems my testing found.

First, almost all files had trailing spaces. Java code had also TABs.
Second, I have to increase stub buffer size 'code_size2' since I hit on 
windows assert:

#  Internal Error 
pid=37668, tid=0x0000000000003514
#  assert(allocates2(pc)) failed: not in CodeBuffer memory: 
0x0000002b80146ce0 <= 0x0000002b8014c2d1 <= 0x0000002b8014c2d0

V  [jvm.dll+0x373471] ?report_vm_error@@YAXPEBDH00 at Z+0x71
V  [jvm.dll+0x50d53] ?set_end at CodeSection@@QEAAXPEAE at Z+0x73
V  [jvm.dll+0x4526e5] 
?square_to_len at MacroAssembler@@QEAAXPEAVRegisterImpl@@0000000000 at Z+0x5f5
V  [jvm.dll+0x4818f5] ?generate_squareToLen at StubGenerator@@AEAAPEAEXZ+0x145

I fixed all that and now is pushing.


On 6/2/15 7:00 PM, Vladimir Kozlov wrote:
> I created RFE which is linked to sub-task JDK-8069539:
> Vladimir
> On 6/2/15 5:51 PM, Anthony Scarpino wrote:
>> On Jun 2, 2015, at 4:25 PM, Vladimir Kozlov
>> <vladimir.kozlov at> wrote:
>>> On 6/2/15 2:56 PM, Viswanathan, Sandhya wrote:
>>>> Hi John/Tony, Thanks a lot for your comments and inputs.
>>>> Hi Vladimir, The patch for 8069539 has the checks as per John's
>>>> email. Please advise if the patch looks ok to you for the next
>>>> steps. Let me know if I need to make any changes.
>>> I think webrev.01 is good.
>>> We can push it since "JEP 246: Leverage CPU Instructions for GHASH
>>> and RSA" is in "Targeted" state (thanks Tony!).
>>> If nobody objects I can push it.
>> That is fine with me.
>> I’m not sure if subtasks can be used to push into the repo, but I have
>> one for the RSA work that you can take ownership of and use,
>> JDK-8069539 which is attached to the JEP.
>> If you need a bug/rfe, file a new one and let me know so I can link it
>> to the JEP.
>> thanks
>> Tony
>>> Thanks,
>>> Vladimir
>>>> Best Regards,
>>>> Sandhya
>>>> -----Original Message-----
>>>> From: John Rose [mailto:john.r.rose at]
>>>> Sent: Monday, June 01, 2015 5:51 PM
>>>> To: Anthony Scarpino
>>>> Cc: Viswanathan, Sandhya; Vladimir Kozlov;
>>>> hotspot-compiler-dev at
>>>> Subject: Re: RSA and Diffie-Hellman performance [Was: RFR(L):
>>>> 8069539: RSA acceleration]
>>>> The important goal, regarding the checks, is to tightly couple the
>>>> validity checks to the actual loop, without actually putting the
>>>> checks into the same method as the loop (which is going to be
>>>> replaced by assembly code!).  There should be one copy of the checks
>>>> and one copy of of the loop itself.  The organization of the source
>>>> code should clearly co-locate the checks and the loop.  If these
>>>> goals are not met, then future changes to the software could
>>>> introduce calls to the loop which are not properly guarded by
>>>> validity checks.
>>>> To do this, you need at least two methods.  One can be a wrapper for
>>>> the loop, and can contain the check code (single copy).  Or, one
>>>> method can be just checks; then each call of the loop method needs
>>>> to be preceded by a call to the check method.  Either pattern will
>>>> work.  There may be other ways to do it, also.
>>>> For the sake of clarity, I think the validity checks for the
>>>> intrinsified loop should be called out clearly, which means not
>>>> mixing them with other validity checks.  In the case of 8073108, I'm
>>>> not sure whether the checks that precede processBlocks are all
>>>> necessary to the intrinsified loop, or whether some of them are
>>>> related to the contract of the update method.  Putting them in their
>>>> own method processBlocksChecks would make that more clear and
>>>> maintainable.  It may be that *all* of the check are relevant to the
>>>> loop, in which case they should be linked more formally to the loop,
>>>> using a coding pattern that makes it clear.  In the code for
>>>> 8069539, implSquareToLenChecks clearly provides the preconditions
>>>> for an assembly-coded loop in implSquareToLen to be safely executed.
>>>> Having two methods instead of one is almost never a problem.  Method
>>>> call overhead is zero in hot code, since everything inlines.
>>>> I know I'm being picky, but I get that way when working
>>>> hand-compiled assembly code.
>>>> HTH,
>>>> - John
>>>> On May 28, 2015, at 4:39 PM, Anthony Scarpino
>>>> <anthony.scarpino at> wrote:
>>>>> Personally I think it better to not have implSquareToLenChecks()
>>>>> and implMulAddCheck() as separate methods and to have the range
>>>>> check squareToLen and mulAdd.  Given these change are about
>>>>> performance, it seems unnecessary to add an extra call to a method.
>>>>> While we are changing BigInteger, should a range check for
>>>>> multiplyToLen be added?  Or is there a different bug for that?
>>>>> Tony
>>>>> On 05/27/2015 06:27 PM, Viswanathan, Sandhya wrote:
>>>>>> Hi Tony,
>>>>>> Please let us know if you are ok with the changes in
>>>>>> (range checks) in patch from Intel:
>>>>>> Per Andrew's email below we could go ahead with this patch and it
>>>>>> shouldn't affect his work.
>>>>>> Best Regards,
>>>>>> Sandhya
>>>>>> -----Original Message-----
>>>>>> From: hotspot-compiler-dev
>>>>>> [mailto:hotspot-compiler-dev-bounces at] On Behalf
>>>>>> Of Andrew Haley
>>>>>> Sent: Wednesday, May 27, 2015 10:12 AM
>>>>>> To: Christian Thalinger
>>>>>> Cc: Vladimir Kozlov; hotspot-compiler-dev at
>>>>>> Subject: RSA and Diffie-Hellman performance [Was: RFR(L): 8069539:
>>>>>> RSA acceleration]
>>>>>> An update:
>>>>>> I'm still working on this.  Following last week's revelations [1] it
>>>>>> seems to me that a faster implementation of (integer) D-H is even
>>>>>> more
>>>>>> important.
>>>>>> I've spent a couple of days tracking down an extremely odd feature
>>>>>> (bug?) in MutableBigInteger which was breaking everything, but I'm
>>>>>> past that now.  I'm trying to produce an intrinsic implementation of
>>>>>> the core modular exponentiation which is as fast as any state-of-the-
>>>>>> art implementation while disrupting the common code as little as
>>>>>> possible; this is not easy.
>>>>>> I hope to have something which is faster on all processors, not just
>>>>>> those for which we have hand-coded assembly-language implementations.
>>>>>> I don't think that my work should be any impediment to Sadya's patch
>>>>>> for squareToLen at
>>>>>> being committed.  It'll still be useful.
>>>>>> Andrew.
>>>>>> [1]  Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice

More information about the hotspot-compiler-dev mailing list