On Mon, 15 Apr 2024 22:04:14 GMT, Volodymyr Paprotski wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly_mont.cpp line 394:
>>
>>> 392: __ lea(aLimbs, Address(aLimbs,8));
>>> 393: __ lea(bLimbs, Address(bLimbs,8));
>>> 394: __ jmp(L_DefaultLoop);
>>
>> Both sub and cmp are flag affec
On Tue, 2 Apr 2024 19:19:59 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
>
On Wed, 23 Nov 2022 23:33:32 GMT, Volodymyr Paprotski wrote:
> Regarding mainline:
> - I decided not to 'unroll' the top while loop (i.e. `engineUpdate(byte[]
> input, int offset, int len)` is unrolled)
>- It is debatable which version is easier to understand. If this version
> is 'too comp
On Tue, 8 Nov 2022 23:21:58 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz t
On Tue, 1 Nov 2022 23:04:45 GMT, Vladimir Ivanov wrote:
>> Hmm.. interesting. Is this for loading? `evmovdquq` vs `evmovdqaq`? I was
>> actually looking at using evmovdqaq but there is no encoding for it yet (And
>> just looking now on uops.info, they seem to have identical timings? perhaps
>>
On Wed, 26 Oct 2022 21:11:33 GMT, Jamil Nimeh wrote:
>> 10% is not a negligible impact. I see your point about AVX512 reaping the
>> rewards of this change, but there are plenty of x86_64 systems without
>> AVX512 that will be impacted, not to mention other platforms like aarch64
>> which (fo
On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz test to co
On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote:
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
> message blocks at a time. For more details, left a lot of comments in
> `macroAssembler_x86_poly.cpp`.
>
> - Added new KAT test for Poly1305 and a fuzz test to compare
ge from `rotateRight(a, (32 -b))` to `rotateLeft(a, b)`, I think it
> already looks much better.
> @wangweij , I agree it looks much better. Thanks for the `rotateLeft`
> suggestion.
>
> @jatin-bhateja , I ran micro performance tests on my MacBook (macosx-aarch64)
> and did
On Tue, 12 Jul 2022 00:44:37 GMT, Mark Powers wrote:
>> https://bugs.openjdk.org/browse/JDK-4887998
>
> 1. I tried to keep as much of the original parenthesis as possible. I can do
> the minimum number if that is what you prefer.
> 2. I'll change MD4, MD5, and SHA-1 to use `rotateLeft` in keepin
10 matches
Mail list logo