On Wed, 5 Oct 2022 17:37:25 GMT, Xue-Lei Andrew Fan <[email protected]> wrote:
>> Hi,
>>
>> May I have this patch reviewed?
>>
>> This is one of a few steps to improve the EC performance. The multiplicative
>> inverse implementation could be improved for better performance.
>>
>> For secp256r1 prime p, the current multiplicative inverse impl needs 256
>> square and 128 multiplication. With the path, the operation needs 256
>> square and 37 multiplication.
>>
>> For secp256r1 order n, the current multiplicative inverse impl needs 256
>> square and 169 multiplication. With the patch, the operation needs 256
>> square and 94 multiplication.
>>
>> Here is the benchmark numbers before the patch applied:
>>
>> Benchmark (messageLength) Mode Cnt Score Error Units
>> Signatures.sign 64 thrpt 15 1412.644 ± 5.529 ops/s
>> Signatures.sign 512 thrpt 15 1407.711 ± 14.118 ops/s
>> Signatures.sign 2048 thrpt 15 1415.674 ± 6.965 ops/s
>> Signatures.sign 16384 thrpt 15 1395.582 ± 12.689 ops/s
>>
>>
>> And the following are the benchmarking after the patch applied.
>>
>> Benchmark (messageLength) Mode Cnt Score Error Units
>> Signatures.sign 64 thrpt 15 1459.527 ± 10.160 ops/s
>> Signatures.sign 512 thrpt 15 1450.707 ± 15.554 ops/s
>> Signatures.sign 2048 thrpt 15 1453.490 ± 5.362 ops/s
>> Signatures.sign 16384 thrpt 15 1437.364 ± 8.291 ops/s
>>
>>
>> It looks like the performance improvement is no significant enough for now.
>> But it may be 2+ times more in numbers when the scalar multiplication
>> implementation is improved in a follow-up enhancement in another pull
>> request.
>>
>> Enhancement for other curves will be considered in separate pull requests.
>>
>> Thanks,
>> Xuelei
>
> Xue-Lei Andrew Fan has updated the pull request incrementally with one
> additional commit since the last revision:
>
> more performance improvement
src/java.base/share/classes/sun/security/util/math/IntegerModuloP.java line 316:
> 314: // calculate imp ^ (2^16 - 1)
> 315: MutableIntegerModuloP t = imp.mutable();
> 316: for (int i = 15; i != 0; i--) {
Fun!
You can further reduce the number of multiplications:
t3 = t^2 * t
t15 = t3 ^ 4 * t3
t255 = t15^16 * t15
t65535 = t255^256 * t255
only four multiplications to get t^(2^16 - 1)
-------------
PR: https://git.openjdk.org/jdk/pull/10544