On Tue, 7 Sep 2021 22:31:30 GMT, Smita Kamath <[email protected]> wrote:

> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not 
> support the new intrinsic. Tests run were crypto.full.AESGCMBench and 
> crypto.full.AESGCMByteBuffer from the jmh micro benchmarks.
> 
> The problem is each instance of GHASH allocates 96 extra longs for the 
> AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table 
> space should be allocated differently so that non-supporting CPUs do not 
> suffer this penalty. This issue also affects non-Intel CPUs too.

It seems to me there's a serious problem here. When you execute the 
galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the 
number of blocks to be encrypted. With the older intrinsic things are not so 
very bad because the incoming data is split into 6 segments. But if we use this 
intrinsic, there is no safepoint check in the inner loop, which can lead to a 
long time to safepoint, and this causes stalls on the other threads.
If you split the incoming data into blocks of about a megabyte you'd lose no 
measurable performance but you'd dramatically improve the performance of 
everything else, especially with a concurrent GC.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5402

Reply via email to