On Tue, 12 May 2026 08:22:36 GMT, Andrew Dinn <[email protected]> wrote:

>> This PR:
>> - changes existing AVX512 SHA3 intrinsic to be more parallel
>> - adds an AVX2 SHA3 intrinsic
>> - change `SHA3Parallel.java` to NR=4 (to be able to exploit the AVX512 
>> parallelism while keeping doubleKeccak for platforms where double 
>> parallelism is preferable. I experimented with NR=8 as well, does also gain 
>> a few percent, but I think NR=4 is sufficient tradeoff)
>> 
>> Performance gains:
>> - `MessageDigestBench.digest`:
>>   - AVX2: **16%-39%**
>>   - AVX512: **24%-33%**
>> - `SignatureBench.MLDSA.sign`
>>   - AVX2: **6-12%**
>>   - AVX512: **11%-18%**
>> - `SignatureBench.MLDSA.verify`
>>   - AVX2: **2%-14%**
>>   - AVX512: **31%-40%**
>> - `KEMBench.MLKEM`
>>   - AVX2: **~5%**
>>   - AVX512: **14%-23%**
>> - `KEMBench.JSSE_*`
>>   - appears unaffected
>> 
>> Note on intrinsics. (As noted in the code..) there are multiple entrypoints 
>> wrapping the same intrinsic..
>> - `SHA3.implCompress`: single blockSize of user data xored with keccak
>> - `DigestBase.implCompressMultiBlock`: loop over user data and xor with 
>> keccak
>> - `SHA3Parallel.doubleKeccak`: (still used for AVX2) no message data, just 
>> two state vectors
>> - `SHA3Parallel.quadKeccak`: (AVX512 benefit) no message data, four state 
>> vectors
>> 
>> Note 1: `make test 
>> TEST="micro:org.openjdk.bench.javax.crypto.full.MessageDigestBench 
>> micro:org.openjdk.bench.javax.crypto.full.SignatureBench.MLDSA 
>> micro:org.openjdk.bench.javax.crypto.full.KEMBench"`
>> Note 2: I have left more targeted fuzzing and benchmarks out of this PR, but 
>> they are preserved at [on my 
>> branch](https://github.com/vpaprotsk/jdk/compare/sha3-avx-quad...vpaprotsk:jdk:sha3-avx-quad-extras?expand=1).
>>  If there is something you rather see pulled in.. (otherwise, can include a 
>> diff in JBS for 'future reference')
>> 
>> ---------
>> - [X] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 96:
> 
>> 94:                                           StubGenerator *stubgen,
>> 95:                                           MacroAssembler *_masm) {
>> 96:   bool multiBlock;
> 
> You have moved the switch statement which verifies the stub id is in the 
> expected range after the call to `load_archive_data`. Please return it to the 
> start of this method to ensure we perform validation before trying to 
> retrieve the stub from the AOT cache.
> 
> Note that you need to follow the original and provide a case switch for all 
> legitimate cases and a default case which calls `ShouldNotReachHere()`.

I remember removing this because "this function is a local (static) function, 
essentially a private helper in this file.. it is only ever called here.. no 
need to tripple-check every parameter.. and it will just add complexity.. " 
(changed my mind!)
```C++
void StubGenerator::generate_sha3_stubs() {
  bool avx512Available = VM_Version::supports_evex() && 
VM_Version::supports_avx512bw();
  if (UseSHA3Intrinsics) {
    if (avx512Available) {
      StubRoutines::_sha3_implCompress =
        generate_sha3_implCompress_avx512(StubId::stubgen_sha3_implCompress_id, 
this, _masm);
      StubRoutines::_sha3_implCompressMB =
        
generate_sha3_implCompress_avx512(StubId::stubgen_sha3_implCompressMB_id, this, 
_masm);
      StubRoutines::_double_keccak =
        generate_sha3_implCompress_avx512(StubId::stubgen_double_keccak_id, 
this, _masm);
      StubRoutines::_quad_keccak =
        generate_sha3_implCompress_avx512(StubId::stubgen_quad_keccak_id, this, 
_masm);
    } else {
      StubRoutines::_sha3_implCompress =
        generate_sha3_implCompress_avx2(StubId::stubgen_sha3_implCompress_id, 
this, _masm);
      StubRoutines::_double_keccak =
        generate_sha3_implCompress_avx2(StubId::stubgen_double_keccak_id, this, 
_masm);
      StubRoutines::_sha3_implCompressMB =
        generate_sha3_implCompress_avx2(StubId::stubgen_sha3_implCompressMB_id, 
this, _masm);
    }
  }
}


but.. I compacted your version to just:

```C++
  switch(stub_id) {
  case StubId::stubgen_sha3_implCompress_id:
  case StubId::stubgen_sha3_implCompressMB_id:
  case StubId::stubgen_double_keccak_id:
  case StubId::stubgen_quad_keccak_id:
    break;
  default:
    ShouldNotReachHere();
  }


In fact.. its 'self-documentation' on ways to call this helper.. works for me, 
thanks!

> src/hotspot/cpu/x86/stubGenerator_x86_64_sha3.cpp line 624:
> 
>> 622:                                           StubGenerator *stubgen,
>> 623:                                           MacroAssembler *_masm) {
>> 624:   bool multiBlock = stub_id == StubId::stubgen_sha3_implCompressMB_id;
> 
> Please insert a switch statement here like the one employed in the previous 
> generator to validate the stub id before we call `load_archive_data`.

Done

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3229832536
PR Review Comment: https://git.openjdk.org/jdk/pull/31125#discussion_r3229837964

Reply via email to