This PR: - changes existing AVX512 SHA3 intrinsic to be more parallel - adds an AVX2 SHA3 intrinsic - change `SHA3Parallel.java` to NR=4 (to be able to exploit the AVX512 parallelism while keeping doubleKeccak for platforms where double parallelism is preferable. I experimented with NR=8 as well, does also gain a few percent, but I think NR=4 is sufficient tradeoff)
Performance gains: - `MessageDigestBench.digest`: - AVX2: **16%-39%** - AVX512: **24%-33%** - `SignatureBench.MLDSA.sign` - AVX2: **6-12%** - AVX512: **11%-18%** - `SignatureBench.MLDSA.verify` - AVX2: **2%-14%** - AVX512: **31%-40%** - `KEMBench.MLKEM` - AVX2: **~5%** - AVX512: **14%-23%** - `KEMBench.JSSE_*` - appears unaffected Note on intrinsics. (As noted in the code..) there are multiple entrypoints wrapping the same intrinsic.. - `SHA3.implCompress`: single blockSize of user data xored with keccak - `DigestBase.implCompressMultiBlock`: loop over user data and xor with keccak - `SHA3Parallel.doubleKeccak`: (still used for AVX2) no message data, just two state vectors - `SHA3Parallel.quadKeccak`: (AVX512 benefit) no message data, four state vectors Note 1: `make test TEST="micro:org.openjdk.bench.javax.crypto.full.MessageDigestBench micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA micro:org.openjdk.bench.javax.crypto.full.KEMBench"` Note 2: I have left more targeted fuzzing and benchmarks out of this PR, but they are preserved at [on my branch](https://github.com/vpaprotsk/jdk/compare/sha3-avx-quad...vpaprotsk:jdk:sha3-avx-quad-extras?expand=1). If there is something you rather see pulled in.. (otherwise, can include a diff in JBS for 'future reference') --------- - [X] I confirm that I make this contribution in accordance with the [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). ------------- Commit messages: - rebase/rewrite on master after review Changes: https://git.openjdk.org/jdk/pull/31125/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=31125&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8384353 Stats: 1374 lines in 20 files changed: 951 ins; 119 del; 304 mod Patch: https://git.openjdk.org/jdk/pull/31125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/31125/head:pull/31125 PR: https://git.openjdk.org/jdk/pull/31125
