On Thu, 25 Jun 2026 18:07:11 GMT, Volodymyr Paprotski <[email protected]> 
wrote:

> > > I ran ML-KEM and ML-DSA benchmarks on a Macbook Pro M3 36GB Memory, see 
> > > attached. Everything is improved with the set of changes except for 
> > > ML-KEM-512 (encapsulation:-0.92%/decapsulation:-2.25%) and ML-DSA-44 (key 
> > > generation:-1.03%). 
> > > [8384353BenchmarksM3Pro.pdf](https://github.com/user-attachments/files/29319713/8384353BenchmarksM3Pro.pdf)
> > 
> > 
> > Reran the specific benchmarks that showed regression on a quieter system 
> > (MacbookAir M3 8GB Memory) and averaged the results, which showed 
> > negligible differences of the afflicted benchmarks as follows: ML-KEM-512 
> > (encapsulation:-0.38%/decapsulation:-0.21%) and ML-DSA-44 (key generation: 
> > -0.18%). In other words, the current fix resolves the regression in 
> > performance by either negligible differences or better performance, where 
> > the high observed +6.55% (ML-KEM-768, which would be one of the algorithms 
> > that would gain the most from this delta).
> 
> Thanks for the measurements, and the extra verification.. the first set of 
> results really threw me for a loop, I dont see why it would be slower.
> 
> The second rerun.. seems we are safe, so the PR performance is acceptable? 
> (My theory is that GC is introducing a lot of variability to the benchmarks?)

Yes, it's acceptable to me.  I reran the benchmarks because of the variability 
from the first AArch64 system tested.  The difference in SE alone between the 
two systems would cause a 0.68% variability alone for ML-KEM-512 decapsulation, 
as an example.

> We should now match _exactly_ the same number of `doubleKeccak` as before 
> (admittedly, in possibly different order..)
> 
> ```
> int[][][] a = new int[mlDsa_k][mlDsa_l][];
> ML_DSA_44  << here
>                 mlDsa_k = 4;
>                 mlDsa_l = 4;
> ML_DSA_65
>                 mlDsa_k = 6;
>                 mlDsa_l = 5;
> ML_DSA_87
>                 mlDsa_k = 8;
>                 mlDsa_l = 7;
> 
> short[][][] a = new short[mlKem_k][mlKem_k][];
> ML-KEM-512 << here
>                 mlKem_k = 2;
> ML-KEM-768
>                 mlKem_k = 3;
> ML-KEM-1024
>                 mlKem_k = 4;
> ```
> 
> For the first batch, the call sequence..
> 
> ```
> ML_DSA_44
> Before: 2 + 2; 2 + 2; 2 + 2; 2 + 2
> Now: 4; 4; 4; 4 (but without quadKeccak, 4 becomes 2 + 2)
> 
> ML-KEM-512
> Before: 2 ; 2
> Now: 2 ; 2 (no change, each row has to be fully processed before moving to 
> next)
> ```

-------------

PR Comment: https://git.openjdk.org/jdk/pull/31648#issuecomment-4803842971

Reply via email to