[jdk23] Integrated: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-26 Thread Volodymyr Paprotski
On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote: > Hi all, > > This pull request contains a backport of commit > [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) > from the [openjdk/jdk](https://git.openjdk.org/

[jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-25 Thread Volodymyr Paprotski
Hi all, This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. Thanks! - Commit messages: - Backport

Integrated: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-25 Thread Volodymyr Paprotski
On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote: > This fix recovers XDH performance but removes some of the P256 gains > (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to > return partially redu

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-25 Thread Volodymyr Paprotski
On Tue, 25 Jun 2024 17:31:09 GMT, Ferenc Rakoczi wrote: >> Hi @vpaprotsk, >> @ferakocz is going to take a look at the change. When he says it's ok, I'll >> approve the PR. > > @ascarpino please approve this change. Thanks @ferakocz @ascarpino - PR Comment:

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-24 Thread Volodymyr Paprotski
On Mon, 24 Jun 2024 14:48:43 GMT, Ferenc Rakoczi wrote: >> @ferakocz just tagging you as reminder of (the many) items in your queue :) >> Thanks! > >> @ferakocz just tagging you as reminder of (the many) items in your queue :) >> Thanks! > > Sorry, I was out of office last week. I will take a

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-20 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains >> (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to >> r

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-18 Thread Volodymyr Paprotski
On Tue, 18 Jun 2024 15:10:37 GMT, Vladimir Kozlov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> comment from Sandhya > > @TobiHartmann ran our testing and it passed.

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 23:29:18 GMT, Vladimir Kozlov wrote: > Talking about future improvements. Is it possible to optimize reduction code > by converting it to intrinsic too? Or code generated by C2 is good enough? I had some experiments to try where I was using virtual methods to add

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 21:21:01 GMT, Vladimir Kozlov wrote: > Let me know that I got it right: > > * The reduction operation was optional and P256 benefitted by not executing > it. > * Previous `mult()` **Java** code always retuned 0 because it executes > reduction so callers do not need to do

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 19:22:01 GMT, Vladimir Kozlov wrote: > Looking on `MontgomeryIntegerPolynomialP256.java` the code in `multImpl() + > reducePositive()` is similar to original `mult()` except new additional code > at the end of `multImpl()`. Yep, I split the original java mult() into

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 18:12:16 GMT, Vladimir Kozlov wrote: > What causes regression in P256 "(~-8-14%)"? From what I see, you re-arranged > code to not execute some code ("reducePositive()") when it is not needed. How > this affects P256? Actually, the other way around; reducePositive is now an

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2]

2024-06-17 Thread Volodymyr Paprotski
On Fri, 14 Jun 2024 23:39:54 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Improve non-intrinsic p256 performance > > src/hotspot/share/opto/run

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
6 > EC thrpt3 1350.745 ± 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1349.393 ± 32.050 ops/s > > Performance in master without mult() intrinsic > &

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2]

2024-06-14 Thread Volodymyr Paprotski
6 > EC thrpt3 1350.745 ± 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1349.393 ± 32.050 ops/s > > Performance in master without mult() intrinsic > &

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-14 Thread Volodymyr Paprotski
On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote: > This fix recovers XDH performance but removes some of the P256 gains > (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to > return partially redu

RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-14 Thread Volodymyr Paprotski
This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (i.e. this avoids extra reductions when mult() result is fed into addition).

Integrated: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic

2024-05-22 Thread Volodymyr Paprotski
On Tue, 2 Apr 2024 15:42:05 GMT, Volodymyr Paprotski wrote: > Performance. Before: > > Benchmark(algorithm) (dataSize) (keyLength) > (provider) Mode Cnt ScoreError Units > SignatureBench.ECDSA.signSHA256withECDSA10

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]

2024-05-22 Thread Volodymyr Paprotski
On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256with

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-21 Thread Volodymyr Paprotski
On Tue, 21 May 2024 07:21:14 GMT, Tobias Hartmann wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> shenandoah verifier > > I'm getting some conflicts when trying to apply this

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]

2024-05-21 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574 ±

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Volodymyr Paprotski
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256with

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]

2024-05-17 Thread Volodymyr Paprotski
On Thu, 16 May 2024 23:21:36 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> whitespace > > src/hotspot/cpu/x86/stubGenerator_x86_64_pol

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v10]

2024-05-17 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v8]

2024-05-09 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]

2024-05-09 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7]

2024-05-09 Thread Volodymyr Paprotski
On Thu, 9 May 2024 23:36:03 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> whitespace > > src/java.base/share/classes/sun/security/ec/ECOpera

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7]

2024-05-09 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v6]

2024-05-06 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v5]

2024-04-25 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-24 Thread Volodymyr Paprotski
On Tue, 9 Apr 2024 02:01:36 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove use of jdk.crypto.ec > > src/java.base/share/classes/sun/security

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v3]

2024-04-24 Thread Volodymyr Paprotski
On Tue, 23 Apr 2024 19:55:57 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Comments from Jatin and Tony > > src/java.base/share/classes/sun/security

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-24 Thread Volodymyr Paprotski
On Tue, 16 Apr 2024 02:26:57 GMT, Jatin Bhateja wrote: >> Per-above, this is a switch statement (`UNLIKELY`) fallback. I can still add >> alignment and loop rotation, but being a fallback figured its more important >> to keep it small > > It's all part of intrinsic, no harm in polishing it.

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v4]

2024-04-24 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Volodymyr Paprotski
On Fri, 5 Apr 2024 07:19:28 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove use of jdk.crypto.ec > > src/hotspot/cpu/x86/stubGenerator_x86_64_p

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Volodymyr Paprotski
On Thu, 11 Apr 2024 17:15:21 GMT, Anthony Scarpino wrote: >>> In `ECOperations.java`, if I understand this correctly, it is to replace >>> the existing `PointMultiplier` with montgomery-based PointMuliplier. But >>> when I look at the code, I see both are still options. If I read this >>>

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Volodymyr Paprotski
On Wed, 10 Apr 2024 23:56:52 GMT, Volodymyr Paprotski wrote: > Few early comments. > > Please update the copyright year of all the modified files. > > You can even consider splitting this into two patches, Java side changes in > one and x86 optimized intrinsic in ne

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v3]

2024-04-15 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-10 Thread Volodymyr Paprotski
On Fri, 5 Apr 2024 09:17:18 GMT, Jatin Bhateja wrote: > Few early comments. > > Please update the copyright year of all the modified files. > > You can even consider splitting this into two patches, Java side changes in > one and x86 optimized intrinsic in next one. Thanks Jatin, will fix!

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-10 Thread Volodymyr Paprotski
On Wed, 10 Apr 2024 17:18:55 GMT, Anthony Scarpino wrote: > In `ECOperations.java`, if I understand this correctly, it is to replace the > existing `PointMultiplier` with montgomery-based PointMuliplier. But when I > look at the code, I see both are still options. If I read this correctly, it

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-05 Thread Volodymyr Paprotski
On Tue, 2 Apr 2024 19:19:59 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256with

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-02 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-02 Thread Volodymyr Paprotski
On Tue, 2 Apr 2024 16:29:07 GMT, Alan Bateman wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove use of jdk.crypto.ec > > src/java.base/share/classes/module

RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic

2024-04-02 Thread Volodymyr Paprotski
Performance. Before: Benchmark(algorithm) (dataSize) (keyLength) (provider) Mode Cnt ScoreError Units SignatureBench.ECDSA.signSHA256withECDSA1024 256 thrpt3 6443.934 ± 6.491 ops/s SignatureBench.ECDSA.sign

Integrated: 8297972: Poly1305 Endianness on ByteBuffer not enforced

2023-01-20 Thread Volodymyr Paprotski
On Thu, 1 Dec 2022 18:28:21 GMT, Volodymyr Paprotski wrote: > Per rfc7539 Section 2.5, "Read the block as a little-endian number." > > sun.security.util.math.intpoly.IntegerPolynomial1305 enforces this on input > when input is provided as `[]byte` but not when i

Re: RFR: 8297972: Poly1305 Endianness on ByteBuffer not enforced [v2]

2023-01-20 Thread Volodymyr Paprotski
On Fri, 20 Jan 2023 17:05:44 GMT, Jamil Nimeh wrote: >> @jnimeh would you mind running this through your tests? The build failures >> reported above seem unrelated.. > > @vpaprotsk regression tests completed successfully on my end. Thanks @jnimeh - PR:

Re: RFR: 8297972: Poly1305 Endianness on ByteBuffer not enforced [v2]

2023-01-19 Thread Volodymyr Paprotski
On Thu, 19 Jan 2023 18:49:30 GMT, Jamil Nimeh wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains th

Re: RFR: 8297972: Poly1305 Endianness on ByteBuffer not enforced [v2]

2023-01-19 Thread Volodymyr Paprotski
FuzzTest.java` from > https://github.com/openjdk/jdk/pull/11338 which compares Poly1305 MAC between > `ByteBuffer` and `[]byte` Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by

Integrated: 8297379: Enable the ByteBuffer path of Poly1305 optimizations

2022-12-06 Thread Volodymyr Paprotski
On Wed, 23 Nov 2022 23:33:32 GMT, Volodymyr Paprotski wrote: > There is now an intrinsic for Poly1305, which is only enabled on the > `engineUpdate([]byte)` path. This PR adds intrinsic support > `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`). > > Fuzzing test e

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-05 Thread Volodymyr Paprotski
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote: >> There is now an intrinsic for Poly1305, which is only enabled on the >> `engineUpdate([]byte)` path. This PR adds intrinsic support >> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`). >>

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v4]

2022-12-05 Thread Volodymyr Paprotski
yteBuffer`. When that one is fixed, > `Poly1305IntrinsicFuzzTest.java` should not be setting the endianness on the > `ByteBuffer` > - Intrinsic introduced by https://github.com/openjdk/jdk/pull/10582. Volodymyr Paprotski has updated the pull request with a new target base due to a merge

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-01 Thread Volodymyr Paprotski
576 thrpt3 > 14961.872 ± 38.003 ops/s > Finished running test > 'micro:org.openjdk.bench.javax.crypto.full.Poly1305DigestBench' Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: remove comme

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-01 Thread Volodymyr Paprotski
On Tue, 29 Nov 2022 01:16:28 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove comment > > src/java.base/share/classes/com/sun/crypto/p

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-01 Thread Volodymyr Paprotski
On Thu, 24 Nov 2022 18:42:01 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove comment > > src/java.base/share/classes/com/sun/crypto/provider/Pol

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-01 Thread Volodymyr Paprotski
On Thu, 1 Dec 2022 18:41:42 GMT, Volodymyr Paprotski wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 268: >> >>> 266: } else { >>> 267: while (blockMultipleLength > 0) { >>> 268:

RFR: 8297972: Poly1305 Endianness on ByteBuffer not enforced

2022-12-01 Thread Volodymyr Paprotski
Per rfc7539 Section 2.5, "Read the block as a little-endian number." sun.security.util.math.intpoly.IntegerPolynomial1305 enforces this on input when input is provided as `[]byte` but not when input is in `ByteBuffer` Tested with `Poly1305IntrinsicFuzzTest.java` from

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v2]

2022-12-01 Thread Volodymyr Paprotski
; - etc. > > Regarding testing > - Correctness of intrinsic was already tested in > https://github.com/openjdk/jdk/pull/10582 so not adding any tests there (i.e. > no KAT) > - In principle, fuzz test should also be sufficient to test bytebuffer (did > increase repetitions) Volody

RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations

2022-11-23 Thread Volodymyr Paprotski
Regarding mainline: - I decided not to 'unroll' the top while loop (i.e. `engineUpdate(byte[] input, int offset, int len)` is unrolled) - It is debatable which version is easier to understand. If this version is 'too complex', I can unroll the top while loop. - I do think this version is

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22]

2022-11-22 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KA

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-22 Thread Volodymyr Paprotski
On Tue, 22 Nov 2022 15:21:44 GMT, Tobias Hartmann wrote: >> @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the >> latest patch again?) > > I fixed the test issue with > [JDK-8297382](https://bugs.openjdk.org/browse/JDK-8297382) but this also > caused a regression

Integrated: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions

2022-11-21 Thread Volodymyr Paprotski
On Wed, 5 Oct 2022 21:28:26 GMT, Volodymyr Paprotski wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-21 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 19:32:28 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> vzeroall, no spill, reg re-map > > Overall, looks good. Just one minor

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-17 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 19:30:14 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> vzeroall, no spill, reg re-map > > src/hotspot/cpu/x86/stubGenerator_x86_6

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22]

2022-11-17 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request inc

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:41:32 GMT, Volodymyr Paprotski wrote: >> Yes, please. And for the upper half of register file, just code it as a loop >> over register range: >> >> for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) { >> XMMRegister rxmm = as_XMMRegist

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:16:14 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: >> >>> 754: >>> 755: // Store R^8-R for later use >>> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); >>

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-16 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 03:19:15 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KA

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-16 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request increme

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:39:00 GMT, Vladimir Ivanov wrote: >> ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and >> I figured since I already have to do the xmm16-29, might as well do them >> all.. should I add that instruction too? > > Yes, please. And for the upper

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:08:16 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917: >> >>> 915: // Cleanup >>> 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); >>> 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit); >> >> You could use T0,

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:12:28 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGene

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-16 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 01:43:46 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> live review with Sandhya > > Overall, it looks good. @iwanowww Answered you

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:44:16 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 25 commits: >> >> - Vladimir's review comments >> - Merge remote-

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 23:51:22 GMT, Vladimir Ivanov wrote: >> Added a comment, hopefully less confusing. > > On a second thought, passing derived pointers as arguments doesn't mix well > with safepoint awareness. > (And this stub eventually has to become safepoint aware.) > Deriving a pointer

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:38:56 GMT, Volodymyr Paprotski wrote: >>> On other hand, there are functions like poly1305_multiply8_avx512 and >>> poly1305_process_blocks_avx512 that use a lot of temp registers. I think it >>> makes sense to keep those as 'function-header de

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:30:23 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx5

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pu

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:41:25 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 25 commits: >> >> - Vladimir's review comments >> - Merge remote-

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v19]

2022-11-15 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request incr

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v18]

2022-11-15 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:43:16 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx5

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:16:19 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx5

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:45:54 GMT, Vladimir Ivanov wrote: >> library_call.cpp takes care of that, it passes the address of 0'th element >> to the stub. > > Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse > rather than help: > > // void processBlocks(byte[] input,

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const >>> Register a0, const Register a1, const Register a2, bo

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-15 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request with a

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:06:40 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx5

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Volodymyr Paprotski
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KA

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v15]

2022-11-14 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 17:56:55 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KA

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request with a n

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-14 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 20:46:57 GMT, Volodymyr Paprotski wrote: >> It's not specific to `andq`: there's a huge `#ifdef` block around the >> definitions in `assembler_x86.hpp` (lines 12201 - 13773; and there's even a >> nested `#ifdef _LP64` (lines 13515-13585)!) , but d

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 20:34:34 GMT, Vladimir Ivanov wrote: >> I am mystified at how it actually gets removed from the `assembler_x86.o` >> object on 32-bit.. The only reliable/portable way _would_ be with `#ifdef` >> but its not there.. so.. code-generation? `sed`-like preprocessing? Can one >>

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 19:56:40 GMT, Vladimir Ivanov wrote: >> I believe its needed. >> >> TLDR.. Couple of check ins ago, I broke the 32-bit build, and that was the >> 'easy' fix.. > > Right, `addq` instructions are x64-specific. I was confused because > `assembler_x86.hpp` doesn't declare them

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v15]

2022-11-11 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request i

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 01:25:07 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> jcheck > > src/java.base/share/classes/com/sun/crypto/provider/Pol

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 01:26:40 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> live review with Sandhya > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request incremen

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13]

2022-11-10 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request incremen

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-10 Thread Volodymyr Paprotski
On Thu, 10 Nov 2022 22:03:24 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix windows and 32b linux builds > > src/hotspot/share/opto/library_

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v12]

2022-11-10 Thread Volodymyr Paprotski
770028.718 ± > 100847.766 ops/s > Poly1305DigestBench.digest 16384 thrpt8 765547.287 ± > 25883.825 ops/s > Poly1305DigestBench.digest 1048576 thrpt814508.458 ± > 56.147 ops/s Volodymyr Paprotski has updated the pull request

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-09 Thread Volodymyr Paprotski
On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KA

  1   2   >