On Sun, 17 May 2026 22:58:49 GMT, Shawn Emery <[email protected]> wrote:
>> Curve25519 polynomial arithmetic is performed with intrinsincs implemented >> in GPR related instructions for multiplication and squaring operations >> (methods mult() and square()). Benchmark improvements include: >> >> - X25519 encapsulation: +19% >> - X25519 decapsulation: +19% >> - X25519-MLKEM encapsulation: +12% >> - X25519-MLKEM decapsulation: +15% >> - X22519 key agreement: +19% >> - X25519 key-pair generation: +19% >> - X25519-MLKEM key-pair generation: +13% >> - EdDSA key-pair generation: +20% >> - EdDSA signing: +19% >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Shawn Emery has updated the pull request incrementally with one additional > commit since the last revision: > > Add square() intrinsics It all looks correct to me. I wrote a Fuzz test (at the end of this comment) to make sure as well. Most of my comments are about style. Did think about a faster reduce(), but thats not easy to implement quickly. import java.util.Random; import java.math.BigInteger; import sun.security.util.math.*; import sun.security.util.math.intpoly.*; /* * @test * @key randomness * @modules java.base/sun.security.util java.base/sun.security.util.math * java.base/sun.security.util.math.intpoly * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:+UseIntPolyIntrinsics * IntegerPolynomial25519Fuzz * @summary Unit test IntegerPolynomial25519Fuzz with intrinsic enabled */ // This test case is NOT entirely deterministic, it uses a random seed for // pseudo-random number generator // If a failure occurs, hardcode the seed to make the test case deterministic public class IntegerPolynomial25519Fuzz { public static void main(String[] args) throws Exception { // Note: it might be useful to increase this number during development final int repeat = 1000000; for (int i = 0; i < repeat; i++) { run(); } System.out.println("Fuzz Success"); } private static void check(String opMsg, BigInteger reference, ImmutableIntegerModuloP testValue, long seed) { BigInteger test = testValue.asBigInteger(); if (!reference.equals(test)) { String msg = "Error while " + opMsg + System.lineSeparator() + reference.toString(16) + " != " + test.toString(16) + System.lineSeparator()+ "To reproduce, set SEED to [" + seed + "L]: "; throw new RuntimeException(msg); } } public static void run() throws Exception { Random rnd = new Random(); long seed = rnd.nextLong(); rnd.setSeed(seed); IntegerPolynomial25519 field = IntegerPolynomial25519.ONE; BigInteger P = evaluateModulus(); BigInteger aRef = new BigInteger(255, rnd).mod(P); BigInteger bRef = new BigInteger(255, rnd).mod(P); String msg = "multiplying "+bRef.toString(16)+" with "+aRef.toString(16); ImmutableIntegerModuloP a = field.getElement(aRef); ImmutableIntegerModuloP b = field.getElement(bRef); for (int i = 0; i<200; i++) { if (rnd.nextBoolean()) { msg = "multiplying "+bRef.toString(16)+" with "+aRef.toString(16); aRef = aRef.multiply(bRef).mod(P); a = a.multiply(b); check(msg, aRef, a, seed); } if (rnd.nextBoolean()) { msg = "multiplying "+bRef.toString(16)+" with "+aRef.toString(16); bRef = bRef.multiply(aRef).mod(P); b = b.multiply(a); check(msg, bRef, b, seed); } if (rnd.nextBoolean()) { msg = "squaring "+aRef.toString(16); aRef = aRef.multiply(aRef).mod(P); a = a.square(); check(msg, aRef, a, seed); } if (rnd.nextBoolean()) { msg = "squaring "+bRef.toString(16); bRef = bRef.multiply(bRef).mod(P); b = b.square(); check(msg, bRef, b, seed); } } } private static BigInteger evaluateModulus() { BigInteger result = BigInteger.valueOf(2).pow(255); result = result.subtract(BigInteger.valueOf(19)); return result; } } //make test TEST="test/jdk/com/sun/security/util/math/intpoly/IntegerPolynomial25519Fuzz.java" src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 32: > 30: > 31: void multiply_25519_scalar(const Register aLimbs, const Register bLimbs, > const Register rLimbs, MacroAssembler* _masm) { > 32: Register c0 = r9; typically, when I create a separate a helper like this, I like to separate math from registers.. i.e. deal with stack and register names in the parent and only pass logical names, (aLimbs, bLimbs, rLimbs, c[], tmp1, tmp2, tmp3, etc) into here.. Up to you, dont think its a hard rule. src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 37: > 35: Register c3 = r12; > 36: Register c4 = r13; > 37: Register c[] = {c0, c1, c2, c3, c4}; Any reason to declare this separately, instead of `{r9,r10, r11, r12, r13}` src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 44: > 42: > 43: __ push(rbp); > 44: __ push(rbx); push_ppx/pop_ppx src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 51: > 49: __ push(rdx); > 50: > 51: __ xorq(c0, c0); for (int i=0; i<5; i++) __ xorq(c[i], c[i]); src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 82: > 80: } > 81: > 82: // Carry-add with reduction from high limb I tinkered with the asm a bit, but I cant make it work quickly; dont think its possible, but maybe you know something and can make the idea work in the future.. Usually these multiplications are giving you a result in the range of 2P, you dont actually need a full reduction; Subtract P, then use the carry out from the subtraction as (i.e. [here](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java#L393-L417)) But that becomes a very close register-allocation puzzle (5 for c[] 5 for c'[], 2xcarry, rLimbs, mask.. ~=at least 14 regs needed). You save some multiplies by add/subtract.. the biasing with CARRY_ADD masks any benefit even more. so this is likely minimal.. src/hotspot/cpu/x86/stubGenerator_x86_64_poly25519.cpp line 85: > 83: Register carry = bArg; > 84: // Limb 3 > 85: __ mov64(carry, 0x4000000000000); mask register is no longer needed; you can load to `0x4000000000000` to mask then just movq to carry (save slightly on code size?) ------------- PR Review: https://git.openjdk.org/jdk/pull/31087#pullrequestreview-4342334946 PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285726706 PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285727279 PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285739053 PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285727975 PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285745221 PR Review Comment: https://git.openjdk.org/jdk/pull/31087#discussion_r3285729354
