Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Jatin Bhateja
On Mon, 15 Apr 2024 22:04:14 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 394: >> >>> 392: __ lea(aLimbs, Address(aLimbs,8)); >>> 393: __ lea(bLimbs, Address(bLimbs,8)); >>> 394: __ jmp(L_DefaultLoop); >> >> Both sub and cmp are flag affec

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-05 Thread Jatin Bhateja
On Tue, 2 Apr 2024 19:19:59 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations

2022-11-24 Thread Jatin Bhateja
On Wed, 23 Nov 2022 23:33:32 GMT, Volodymyr Paprotski wrote: > Regarding mainline: > - I decided not to 'unroll' the top while loop (i.e. `engineUpdate(byte[] > input, int offset, int len)` is unrolled) >- It is debatable which version is easier to understand. If this version > is 'too comp

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-09 Thread Jatin Bhateja
On Tue, 8 Nov 2022 23:21:58 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-01 Thread Jatin Bhateja
On Tue, 1 Nov 2022 23:04:45 GMT, Vladimir Ivanov wrote: >> Hmm.. interesting. Is this for loading? `evmovdquq` vs `evmovdqaq`? I was >> actually looking at using evmovdqaq but there is no encoding for it yet (And >> just looking now on uops.info, they seem to have identical timings? perhaps >>

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-10-27 Thread Jatin Bhateja
On Wed, 26 Oct 2022 21:11:33 GMT, Jamil Nimeh wrote: >> 10% is not a negligible impact. I see your point about AVX512 reaping the >> rewards of this change, but there are plenty of x86_64 systems without >> AVX512 that will be impacted, not to mention other platforms like aarch64 >> which (fo

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-10-27 Thread Jatin Bhateja
On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to co

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions

2022-10-19 Thread Jatin Bhateja
On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare

Re: RFR: 4887998: Use Integer.rotateLeft() and rotateRight() in crypto implementations

2022-07-12 Thread Jatin Bhateja
ge from `rotateRight(a, (32 -b))` to `rotateLeft(a, b)`, I think it > already looks much better. > @wangweij , I agree it looks much better. Thanks for the `rotateLeft` > suggestion. > > @jatin-bhateja , I ran micro performance tests on my MacBook (macosx-aarch64) > and did

Re: RFR: 4887998: Use Integer.rotateLeft() and rotateRight() in crypto implementations

2022-07-11 Thread Jatin Bhateja
On Tue, 12 Jul 2022 00:44:37 GMT, Mark Powers wrote: >> https://bugs.openjdk.org/browse/JDK-4887998 > > 1. I tried to keep as much of the original parenthesis as possible. I can do > the minimum number if that is what you prefer. > 2. I'll change MD4, MD5, and SHA-1 to use `rotateLeft` in keepin