RFR 8181594: Efficient and constant-time modular arithmetic

Adam Petcher adam.petcher at oracle.com
Tue Jan 30 16:52:01 UTC 2018


On 1/26/2018 4:06 PM, Adam Petcher wrote:
> JBS: https://bugs.openjdk.java.net/browse/JDK-8181594
> Webrev: http://cr.openjdk.java.net/~apetcher/8181594/webrev.00/
> This is a code review for the field arithmetic that will be used in 
> implementations of X25519/X448 key agreement, the Poly1305 
> authenticator, and EdDSA signatures. I believe that the library has 
> all the features necessary for X25519/X448 and Poly1305, and I expect 
> at most a couple of minor enhancements will be required to support 
> EdDSA. There is no public API for this library, so we can change it in 
> the future to suit the needs of new algorithms without breaking 
> compatibility with external code. Still, I made an attempt to clearly 
> structure and document the (internal) API, and I want to make sure it 
> is understandable and easy to use.
> This is not a general-purpose modular arithmetic library. It will only 
> work well in circumstances where the sequence of operations is 
> restricted, and where the prime that defines the field has some useful 
> structure. Moreover, each new field will require some field-specific 
> code that takes into account the structure of the prime and the way 
> the field is used in the application. The initial implementation 
> includes a field for Poly1305 and the fields for X25519/X448 which 
> should also work for EdDSA.
> The benefits of using this library are that it is much more efficient 
> than using similar operations in BigInteger. Also, many operations are 
> branch-free, making them suitable for use in a side-channel resistant 
> implementation that does not branch on secrets.
> To provide some context, I have attached a code snippet describing how 
> this library can be used. The snippet is the constant-time Montgomery 
> ladder from my X25519/X448 implementation, which I expect to be out 
> for review soon. X25519/X448 only uses standard arithmetic operations, 
> and the more unusual features (e.g. add modulo a power of 2) are 
> needed by Poly1305.
> The field arithmetic (for all fields) is implemented using a 32-bit 
> representation similar to the one described in the Ed448 paper[1] (in 
> the "Implementation on 32-bit platforms" section). Though my 
> implementation uses signed limbs, and grade-school multiplication 
> instead of Karatsuba. The argument for correctness is essentially the 
> same for all three fields: the magnitude of each 64-bit limb is at 
> most 2^(k-1) after reduction, except for the last limb which may have 
> a magnitude of up to 2^k. The values of k are between 26 to 28 
> (depending on the field), and we can calculate that the maximum 
> magnitude for any limb during an add-multiply-carry-reduce sequence is 
> always less than 2^63. Therefore, no overflow occurs and all 
> operations are correct.
> Process note: this enhancement is part of JEP 324 (Key Agreement with 
> Curve25519 and Curve448). When this code review is complete, nothing 
> will happen until all other work for this JEP is complete, and the JEP 
> is accepted as part of some release. This means that this code will be 
> pushed to the repo along with the X25519/X448 code that uses it.
> [1] https://eprint.iacr.org/2015/625.pdf

-------------- next part --------------
    private IntegerModuloP_Base pointMultiply(byte[] k, IntegerModuloP u){

        IntegerModuloP x_1 = u;
        MutableIntegerModuloP x_2 = one.mutable();
        MutableIntegerModuloP z_2 = zero.mutable();
        MutableIntegerModuloP x_3 = u.mutable();
        MutableIntegerModuloP z_3 = one.mutable();
        int swap = 0;

        // Variables below are reused to avoid unnecessary allocation
        // They will be assigned in the loop, so initial value doesn't matter
        MutableIntegerModuloP m1 = zero.mutable();
        MutableIntegerModuloP DA = zero.mutable();
        MutableIntegerModuloP E = zero.mutable();
        MutableIntegerModuloP a24_times_E = zero.mutable();

        for(int t = params.getBits() - 1; t >= 0; t--){
            int k_t = bitAt(k, t);
            swap = swap ^ k_t;
            x_2.conditionalSwapWith(x_3, swap);
            z_2.conditionalSwapWith(z_3, swap);
            swap = k_t;

            // A(m1) = x_2 + z_2
            // D = x_3 - z_3
            // DA = D * A(m1)
            // AA(m1) = A(m1)^2
            // B(x_2) = x_2 - z_2
            // C = x_3 + z_3
            // CB(x_3) = C * B(x_2)
            // BB(x_2) = B^2
            // E = AA(m1) - BB(x_2)
            // compute a24 * E using SmallValue

            // assign results to x_3, z_3, x_2, z_2
            // x_2 = AA(m1) * BB
            // z_2 = E * (AA(m1) + a24 * E)
            // z_3 = x_1*(DA - CB(x_3))^2
            // x_3 = (CB(x_3) + DA)^2

        x_2.conditionalSwapWith(x_3, swap);
        z_2.conditionalSwapWith(z_3, swap);

        // return (x_2 * z_2^(p - 2))
        return x_2.setProduct(z_2.multiplicativeInverse());

More information about the core-libs-dev mailing list