On Sun, 17 May 2026 22:58:49 GMT, Shawn Emery <[email protected]> wrote:

>> Curve25519 polynomial arithmetic is performed with intrinsincs implemented 
>> in GPR related instructions for multiplication and squaring operations 
>> (methods mult() and square()).  Benchmark improvements include:
>> 
>> - X25519 encapsulation: +19%
>> - X25519 decapsulation: +19%
>> - X25519-MLKEM encapsulation: +12%
>> - X25519-MLKEM decapsulation: +15%
>> - X22519 key agreement: +19%
>> - X25519 key-pair generation: +19%
>> - X25519-MLKEM key-pair generation: +13%
>> - EdDSA key-pair generation: +20%
>> - EdDSA signing: +19%
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Shawn Emery has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add square() intrinsics

It all looks correct to me. I wrote a Fuzz test (at the end of this comment) to 
make sure as well.

Most of my comments are about style. Did think about a faster reduce(), but 
thats not easy to implement quickly.


import java.util.Random;
import java.math.BigInteger;
import sun.security.util.math.*;
import sun.security.util.math.intpoly.*;

/*
 * @test
 * @key randomness
 * @modules java.base/sun.security.util java.base/sun.security.util.math
 *          java.base/sun.security.util.math.intpoly
 * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+UseIntPolyIntrinsics
 *      IntegerPolynomial25519Fuzz
 * @summary Unit test IntegerPolynomial25519Fuzz with intrinsic enabled
 */

// This test case is NOT entirely deterministic, it uses a random seed for
// pseudo-random number generator
// If a failure occurs, hardcode the seed to make the test case deterministic
public class IntegerPolynomial25519Fuzz {
    public static void main(String[] args) throws Exception {
        // Note: it might be useful to increase this number during development
        final int repeat = 1000000;
        for (int i = 0; i < repeat; i++) {
            run();
        }
        System.out.println("Fuzz Success");
    }

    private static void check(String opMsg, BigInteger reference,
            ImmutableIntegerModuloP testValue, long seed) {
        BigInteger test = testValue.asBigInteger();
        if (!reference.equals(test)) {
            String msg = "Error while " + opMsg + System.lineSeparator()
                + reference.toString(16) + " != " + test.toString(16)
                + System.lineSeparator()+ "To reproduce, set SEED to ["
                + seed + "L]: ";
            throw new RuntimeException(msg);
        }
    }

    public static void run() throws Exception {
        Random rnd = new Random();
        long seed = rnd.nextLong();
        rnd.setSeed(seed);

        IntegerPolynomial25519 field = IntegerPolynomial25519.ONE;
        BigInteger P = evaluateModulus();
        BigInteger aRef = new BigInteger(255, rnd).mod(P);
        BigInteger bRef = new BigInteger(255, rnd).mod(P);

        String msg = "multiplying "+bRef.toString(16)+" with 
"+aRef.toString(16);
        ImmutableIntegerModuloP a = field.getElement(aRef);
        ImmutableIntegerModuloP b = field.getElement(bRef);

        for (int i = 0; i<200; i++) {
            if (rnd.nextBoolean()) {
                msg = "multiplying "+bRef.toString(16)+" with 
"+aRef.toString(16);
                aRef = aRef.multiply(bRef).mod(P);
                a = a.multiply(b);
                check(msg, aRef, a, seed);
            }

            if (rnd.nextBoolean()) {
                msg = "multiplying "+bRef.toString(16)+" with 
"+aRef.toString(16);
                bRef = bRef.multiply(aRef).mod(P);
                b = b.multiply(a);
                check(msg, bRef, b, seed);
            }

            if (rnd.nextBoolean()) {
                msg = "squaring "+aRef.toString(16);
                aRef = aRef.multiply(aRef).mod(P);
                a = a.square();
                check(msg, aRef, a, seed);
            }

            if (rnd.nextBoolean()) {
                msg = "squaring "+bRef.toString(16);
                bRef = bRef.multiply(bRef).mod(P);
                b = b.square();
                check(msg, bRef, b, seed);
            }
        }
    }

    private static BigInteger evaluateModulus() {
        BigInteger result = BigInteger.valueOf(2).pow(255);
        result = result.subtract(BigInteger.valueOf(19));

        return result;
    }
}

//make test 
TEST="test/jdk/com/sun/security/util/math/intpoly/IntegerPolynomial25519Fuzz.java"

src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 32:

> 30: 
> 31: void multiply_25519_scalar(const Register aLimbs, const Register bLimbs, 
> const Register rLimbs, MacroAssembler* _masm) {
> 32:   Register c0    = r9;

typically, when I create a separate a helper like this, I like to separate math 
from registers.. i.e. deal with stack and register names in the parent and only 
pass logical names, (aLimbs, bLimbs, rLimbs, c[], tmp1, tmp2, tmp3, etc)  into 
here.. 

Up to you, dont think its a hard rule.

src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 37:

> 35:   Register c3    = r12;
> 36:   Register c4    = r13;
> 37:   Register c[]   = {c0, c1, c2, c3, c4};

Any reason to declare this separately, instead of `{r9,r10, r11, r12, r13}`

src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 44:

> 42: 
> 43:   __ push(rbp);
> 44:   __ push(rbx);

push_ppx/pop_ppx

src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 51:

> 49:   __ push(rdx);
> 50: 
> 51:   __ xorq(c0, c0);

for (int i=0; i<5; i++) __ xorq(c[i], c[i]);

src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 82:

> 80:   }
> 81: 
> 82:   // Carry-add with reduction from high limb

I tinkered with the asm a bit, but I cant make it work quickly; dont think its 
possible, but maybe you know something and can make the idea work in the 
future.. 

Usually these multiplications are giving you a result in the range of 2P, you 
dont actually need a full reduction; Subtract P, then use the carry out from 
the subtraction as (i.e. 
[here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java#L393-L417))

But that becomes a very close register-allocation puzzle (5 for c[] 5 for c'[], 
2xcarry, rLimbs, mask.. ~=at least 14 regs needed). You save some multiplies by 
add/subtract.. the biasing with CARRY_ADD masks any benefit even more. so this 
is likely minimal..

src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 85:

> 83:   Register carry = bArg;
> 84:   // Limb 3
> 85:   __ mov64(carry, 0x4000000000000);

mask register is no longer needed; you can load to `0x4000000000000` to mask 
then just movq to carry (save slightly on code size?)

-------------

PR Review: https://git.openjdk.org/jdk/pull/31087#pullrequestreview-4342334946
PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285726706
PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285727279
PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285739053
PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285727975
PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285745221
PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285729354

Reply via email to