On Tue, 14 Sep 2021 13:31:19 GMT, Andrew Haley <[email protected]> wrote:

>> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not 
>> support the new intrinsic. Tests run were crypto.full.AESGCMBench and 
>> crypto.full.AESGCMByteBuffer from the jmh micro benchmarks.
>> 
>> The problem is each instance of GHASH allocates 96 extra longs for the 
>> AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table 
>> space should be allocated differently so that non-supporting CPUs do not 
>> suffer this penalty. This issue also affects non-Intel CPUs too.
>
> It seems to me there's a serious problem here. When you execute the 
> galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the 
> number of blocks to be encrypted. With the older intrinsic things are not so 
> very bad because the incoming data is split into 6 segments. But if we use 
> this intrinsic, there is no safepoint check in the inner loop, which can lead 
> to a long time to safepoint, and this causes stalls on the other threads.
> If you split the incoming data into blocks of about a megabyte you'd lose no 
> measurable performance but you'd dramatically improve the performance of 
> everything else, especially with a concurrent GC.

@theRealAph I have implemented changes as per your suggestions. Could you 
review the changes and let me know your thoughts?

-------------

PR: https://git.openjdk.java.net/jdk/pull/5402

Reply via email to