On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote:
> Hi all,
>
> This pull request contains a backport of commit
> [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522)
> from the [openjdk/jdk](https://git.openjdk.org/jdk) repos
Hi all,
This pull request contains a backport of commit
[f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522)
from the [openjdk/jdk](https://git.openjdk.org/jdk) repository.
Thanks!
-
Commit messages:
- Backport f101e153cee68750fcf1f12da10e298
On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote:
> This fix recovers XDH performance but removes some of the P256 gains
> (~-8-14%). Still faster, but not as much.
>
> The fix is to undo 'int' return type on mult()/square(), which allowed to
> return part
On Tue, 25 Jun 2024 17:31:09 GMT, Ferenc Rakoczi wrote:
>> Hi @vpaprotsk,
>> @ferakocz is going to take a look at the change. When he says it's ok, I'll
>> approve the PR.
>
> @ascarpino please approve this change.
Thanks @ferakocz @ascarpino
-
PR Comment: https://git.openjdk.or
On Mon, 24 Jun 2024 14:48:43 GMT, Ferenc Rakoczi wrote:
>> @ferakocz just tagging you as reminder of (the many) items in your queue :)
>> Thanks!
>
>> @ferakocz just tagging you as reminder of (the many) items in your queue :)
>> Thanks!
>
> Sorry, I was out of office last week. I will take a
On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote:
>> This fix recovers XDH performance but removes some of the P256 gains
>> (~-8-14%). Still faster, but not as much.
>>
>> The fix is to undo 'int' return type on mult()/square(), which allowed to
On Tue, 18 Jun 2024 15:10:37 GMT, Vladimir Kozlov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> comment from Sandhya
>
> @TobiHartmann ran our testing and it passed.
On Mon, 17 Jun 2024 23:29:18 GMT, Vladimir Kozlov wrote:
> Talking about future improvements. Is it possible to optimize reduction code
> by converting it to intrinsic too? Or code generated by C2 is good enough?
I had some experiments to try where I was using virtual methods to add
optimizati
On Mon, 17 Jun 2024 21:21:01 GMT, Vladimir Kozlov wrote:
> Let me know that I got it right:
>
> * The reduction operation was optional and P256 benefitted by not executing
> it.
> * Previous `mult()` **Java** code always retuned 0 because it executes
> reduction so callers do not need to do it
On Mon, 17 Jun 2024 19:22:01 GMT, Vladimir Kozlov wrote:
> Looking on `MontgomeryIntegerPolynomialP256.java` the code in `multImpl() +
> reducePositive()` is similar to original `mult()` except new additional code
> at the end of `multImpl()`.
Yep, I split the original java mult() into multIm
On Mon, 17 Jun 2024 18:12:16 GMT, Vladimir Kozlov wrote:
> What causes regression in P256 "(~-8-14%)"? From what I see, you re-arranged
> code to not execute some code ("reducePositive()") when it is not needed. How
> this affects P256?
Actually, the other way around; reducePositive is now an
On Fri, 14 Jun 2024 23:39:54 GMT, Sandhya Viswanathan
wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Improve non-intrinsic p256 performance
>
> src/hotspot/share/opto/run
256
> EC thrpt3 1350.745 ± 28.514 ops/s
> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256
> EC thrpt3 1349.393 ± 32.050 ops/s
>
> Performance in master without mult() intrins
256
> EC thrpt3 1350.745 ± 28.514 ops/s
> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256
> EC thrpt3 1349.393 ± 32.050 ops/s
>
> Performance in master without mult() intrins
On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote:
> This fix recovers XDH performance but removes some of the P256 gains
> (~-8-14%). Still faster, but not as much.
>
> The fix is to undo 'int' return type on mult()/square(), which allowed to
> return part
This fix recovers XDH performance but removes some of the P256 gains (~-8-14%).
Still faster, but not as much.
The fix is to undo 'int' return type on mult()/square(), which allowed to
return partially reduced result (i.e. this avoids extra reductions when mult()
result is fed into addition). T
On Tue, 2 Apr 2024 15:42:05 GMT, Volodymyr Paprotski wrote:
> Performance. Before:
>
> Benchmark(algorithm) (dataSize) (keyLength)
> (provider) Mode Cnt ScoreError Units
> SignatureBench.ECDSA.signSHA256withECDSA10
On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256with
On Tue, 21 May 2024 07:21:14 GMT, Tobias Hartmann wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> shenandoah verifier
>
> I'm getting some conflicts when trying to apply
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574 ±
>
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256with
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
On Thu, 16 May 2024 23:21:36 GMT, Sandhya Viswanathan
wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> whitespace
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_pol
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
On Thu, 9 May 2024 23:36:03 GMT, Anthony Scarpino wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> whitespace
>
> src/java.base/share/classes/sun/security/ec/ECOpera
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
On Tue, 9 Apr 2024 02:01:36 GMT, Anthony Scarpino wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> remove use of jdk.crypto.ec
>
> src/java.base/share/classes/sun/security
On Tue, 23 Apr 2024 19:55:57 GMT, Anthony Scarpino
wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Comments from Jatin and Tony
>
> src/java.base/share/classes/sun/security
On Tue, 16 Apr 2024 02:26:57 GMT, Jatin Bhateja wrote:
>> Per-above, this is a switch statement (`UNLIKELY`) fallback. I can still add
>> alignment and loop rotation, but being a fallback figured its more important
>> to keep it small&readable...
>
> It's all part of intrinsic, no harm in polis
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
On Fri, 5 Apr 2024 07:19:28 GMT, Jatin Bhateja wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> remove use of jdk.crypto.ec
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_p
On Thu, 11 Apr 2024 17:15:21 GMT, Anthony Scarpino
wrote:
>>> In `ECOperations.java`, if I understand this correctly, it is to replace
>>> the existing `PointMultiplier` with montgomery-based PointMuliplier. But
>>> when I look at the code, I see both are still options. If I read this
>>> cor
On Wed, 10 Apr 2024 23:56:52 GMT, Volodymyr Paprotski wrote:
> Few early comments.
>
> Please update the copyright year of all the modified files.
>
> You can even consider splitting this into two patches, Java side changes in
> one and x86 optimized intrinsic in ne
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
On Fri, 5 Apr 2024 09:17:18 GMT, Jatin Bhateja wrote:
> Few early comments.
>
> Please update the copyright year of all the modified files.
>
> You can even consider splitting this into two patches, Java side changes in
> one and x86 optimized intrinsic in next one.
Thanks Jatin, will fix!
-
On Wed, 10 Apr 2024 17:18:55 GMT, Anthony Scarpino
wrote:
> In `ECOperations.java`, if I understand this correctly, it is to replace the
> existing `PointMultiplier` with montgomery-based PointMuliplier. But when I
> look at the code, I see both are still options. If I read this correctly, it
On Tue, 2 Apr 2024 19:19:59 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256with
ntBench.EC.generateSecret ECDH 256
> EC thrpt3 1346.523 ± 28.722 ops/s
> Benchmark (isMontBench) Mode Cnt Score
> Error Units
> PolynomialP256Bench.benchMultiply true thrpt3 1919.574
On Tue, 2 Apr 2024 16:29:07 GMT, Alan Bateman wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> remove use of jdk.crypto.ec
>
> src/java.base/share/classes/module
Performance. Before:
Benchmark(algorithm) (dataSize) (keyLength)
(provider) Mode Cnt ScoreError Units
SignatureBench.ECDSA.signSHA256withECDSA1024 256
thrpt3 6443.934 ± 6.491 ops/s
SignatureBench.ECDSA.signSHA256
On Thu, 1 Dec 2022 18:28:21 GMT, Volodymyr Paprotski wrote:
> Per rfc7539 Section 2.5, "Read the block as a little-endian number."
>
> sun.security.util.math.intpoly.IntegerPolynomial1305 enforces this on input
> when input is provided as `[]byte` but not when i
On Fri, 20 Jan 2023 17:05:44 GMT, Jamil Nimeh wrote:
>> @jnimeh would you mind running this through your tests? The build failures
>> reported above seem unrelated..
>
> @vpaprotsk regression tests completed successfully on my end.
Thanks @jnimeh
-
PR: https://git.openjdk.org/jdk/
On Thu, 19 Jan 2023 18:49:30 GMT, Jamil Nimeh wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The incremental webrev excludes the unrelated
>> changes brought in by the merge/rebase. The pull request contains th
FuzzTest.java` from
> https://github.com/openjdk/jdk/pull/11338 which compares Poly1305 MAC between
> `ByteBuffer` and `[]byte`
Volodymyr Paprotski has updated the pull request with a new target base due to
a merge or a rebase. The incremental webrev excludes the unrelated changes
brought in by
On Wed, 23 Nov 2022 23:33:32 GMT, Volodymyr Paprotski wrote:
> There is now an intrinsic for Poly1305, which is only enabled on the
> `engineUpdate([]byte)` path. This PR adds intrinsic support
> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`).
>
> Fuzzing test e
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote:
>> There is now an intrinsic for Poly1305, which is only enabled on the
>> `engineUpdate([]byte)` path. This PR adds intrinsic support
>> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`).
>>
>
and `ByteBuffer`. When that one is fixed,
> `Poly1305IntrinsicFuzzTest.java` should not be setting the endianness on the
> `ByteBuffer`
> - Intrinsic introduced by https://github.com/openjdk/jdk/pull/10582.
Volodymyr Paprotski has updated the pull request with a new target base due t
576 thrpt3
> 14961.872 ± 38.003 ops/s
> Finished running test
> 'micro:org.openjdk.bench.javax.crypto.full.Poly1305DigestBench'
Volodymyr Paprotski has updated the pull request incrementally with one
additional commit since the last revision:
re
On Tue, 29 Nov 2022 01:16:28 GMT, Sandhya Viswanathan
wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> remove comment
>
> src/java.base/share/classes/com/sun/crypto/p
On Thu, 24 Nov 2022 18:42:01 GMT, Jatin Bhateja wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> remove comment
>
> src/java.base/share/classes/com/sun/crypto/provider/Pol
On Thu, 1 Dec 2022 18:41:42 GMT, Volodymyr Paprotski wrote:
>> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 268:
>>
>>> 266: } else {
>>> 267: while (blockMultipleLength > 0) {
>>> 268: p
Per rfc7539 Section 2.5, "Read the block as a little-endian number."
sun.security.util.math.intpoly.IntegerPolynomial1305 enforces this on input
when input is provided as `[]byte` but not when input is in `ByteBuffer`
Tested with `Poly1305IntrinsicFuzzTest.java` from
https://github.com/openjdk/
flow again..
> - etc.
>
> Regarding testing
> - Correctness of intrinsic was already tested in
> https://github.com/openjdk/jdk/pull/10582 so not adding any tests there (i.e.
> no KAT)
> - In principle, fuzz test should also be sufficient to test bytebuffer (did
> increase
Regarding mainline:
- I decided not to 'unroll' the top while loop (i.e. `engineUpdate(byte[]
input, int offset, int len)` is unrolled)
- It is debatable which version is easier to understand. If this version is
'too complex', I can unroll the top while loop.
- I do think this version is incr
On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KA
On Tue, 22 Nov 2022 15:21:44 GMT, Tobias Hartmann wrote:
>> @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the
>> latest patch again?)
>
> I fixed the test issue with
> [JDK-8297382](https://bugs.openjdk.org/browse/JDK-8297382) but this also
> caused a regression with
On Wed, 5 Oct 2022 21:28:26 GMT, Volodymyr Paprotski wrote:
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
> message blocks at a time. For more details, left a lot of comments in
> `macroAssembler_x86_poly.cpp`.
>
> - Added new KAT test for Poly1305
On Thu, 17 Nov 2022 19:32:28 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> vzeroall, no spill, reg re-map
>
> Overall, looks good. Just one minor
On Thu, 17 Nov 2022 19:30:14 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> vzeroall, no spill, reg re-map
>
> src/hotspot/cpu/x86/stubGenerator_x86_6
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull r
On Wed, 16 Nov 2022 23:41:32 GMT, Volodymyr Paprotski wrote:
>> Yes, please. And for the upper half of register file, just code it as a loop
>> over register range:
>>
>> for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) {
>> XMMRegister rxmm = as_XMMRegist
On Wed, 16 Nov 2022 23:16:14 GMT, Volodymyr Paprotski wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756:
>>
>>> 754:
>>> 755: // Store R^8-R for later use
>>> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit);
>>
On Thu, 17 Nov 2022 03:19:15 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KA
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull reque
On Wed, 16 Nov 2022 23:39:00 GMT, Vladimir Ivanov wrote:
>> ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and
>> I figured since I already have to do the xmm16-29, might as well do them
>> all.. should I add that instruction too?
>
> Yes, please. And for the upper half
On Wed, 16 Nov 2022 23:08:16 GMT, Vladimir Ivanov wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917:
>>
>>> 915: // Cleanup
>>> 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit);
>>> 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit);
>>
>> You could use T0,
On Wed, 16 Nov 2022 23:12:28 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> redo register alloc with explicit func params
>
> src/hotspot/cpu/x86/stubGene
On Fri, 11 Nov 2022 01:43:46 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> live review with Sandhya
>
> Overall, it looks good.
@iwanowww Answered your
On Tue, 15 Nov 2022 19:44:16 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 25 commits:
>>
>> - Vladimir's review comments
>> - Merge re
On Tue, 15 Nov 2022 23:51:22 GMT, Vladimir Ivanov wrote:
>> Added a comment, hopefully less confusing.
>
> On a second thought, passing derived pointers as arguments doesn't mix well
> with safepoint awareness.
> (And this stub eventually has to become safepoint aware.)
> Deriving a pointer insi
On Tue, 15 Nov 2022 19:38:56 GMT, Volodymyr Paprotski wrote:
>>> On other hand, there are functions like poly1305_multiply8_avx512 and
>>> poly1305_process_blocks_avx512 that use a lot of temp registers. I think it
>>> makes sense to keep those as 'function-head
On Tue, 15 Nov 2022 19:30:23 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 23 commits:
>>
>> - Merge remote-tracking branch 'origin/master'
On Mon, 14 Nov 2022 16:37:41 GMT, Xubo Zhang wrote:
>> NativePRNG SecureRandom doesn’t scale with number of threads. The
>> performance starts dropping as we increase the number of threads. Even going
>> from 1 thread to 2 threads shows significant drop. The bottleneck is the
>> singleton Rand
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has upda
On Tue, 15 Nov 2022 19:41:25 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 25 commits:
>>
>> - Vladimir's review comments
>> - Merge re
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull re
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pul
On Tue, 15 Nov 2022 00:43:16 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 23 commits:
>>
>> - Merge remote-tracking branch 'origin/master'
On Tue, 15 Nov 2022 00:16:19 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 23 commits:
>>
>> - Merge remote-tracking branch 'origin/master'
On Tue, 15 Nov 2022 00:45:54 GMT, Vladimir Ivanov wrote:
>> library_call.cpp takes care of that, it passes the address of 0'th element
>> to the stub.
>
> Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse
> rather than help:
>
> // void processBlocks(byte[] input, i
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384:
>>
>>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const
>>> Register a0, const Register a1, const Register a2, bo
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull reque
On Tue, 15 Nov 2022 00:06:40 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 23 commits:
>>
>> - Merge remote-tracking branch 'origin/master'
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KA
On Fri, 11 Nov 2022 17:56:55 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KA
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull reques
On Fri, 11 Nov 2022 20:46:57 GMT, Volodymyr Paprotski wrote:
>> It's not specific to `andq`: there's a huge `#ifdef` block around the
>> definitions in `assembler_x86.hpp` (lines 12201 - 13773; and there's even a
>> nested `#ifdef _LP64` (lines 13515-1
On Fri, 11 Nov 2022 20:34:34 GMT, Vladimir Ivanov wrote:
>> I am mystified at how it actually gets removed from the `assembler_x86.o`
>> object on 32-bit.. The only reliable/portable way _would_ be with `#ifdef`
>> but its not there.. so.. code-generation? `sed`-like preprocessing? Can one
>>
On Fri, 11 Nov 2022 19:56:40 GMT, Vladimir Ivanov wrote:
>> I believe its needed.
>>
>> TLDR.. Couple of check ins ago, I broke the 32-bit build, and that was the
>> 'easy' fix..
>
> Right, `addq` instructions are x64-specific. I was confused because
> `assembler_x86.hpp` doesn't declare them
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull
On Fri, 11 Nov 2022 01:25:07 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> jcheck
>
> src/java.base/share/classes/com/sun/crypto/provider/Pol
On Fri, 11 Nov 2022 01:26:40 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> live review with Sandhya
>
> src/hotspot/cpu/x86/macroAssembler_x86.hpp line
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull reques
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pull reques
On Thu, 10 Nov 2022 22:03:24 GMT, Sandhya Viswanathan
wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> fix windows and 32b linux builds
>
> src/hotspot/share/opto/library_
pt8 1770028.718 ±
> 100847.766 ops/s
> Poly1305DigestBench.digest 16384 thrpt8 765547.287 ±
> 25883.825 ops/s
> Poly1305DigestBench.digest 1048576 thrpt814508.458 ±
> 56.147 ops/s
Volodymyr Paprotski has updated the pul
1 - 100 of 123 matches
Mail list logo