On Tue, 7 Sep 2021 22:31:30 GMT, Smita Kamath <[email protected]> wrote:
> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not > support the new intrinsic. Tests run were crypto.full.AESGCMBench and > crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the > AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table > space should be allocated differently so that non-supporting CPUs do not > suffer this penalty. This issue also affects non-Intel CPUs too. It seems to me there's a serious problem here. When you execute the galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the number of blocks to be encrypted. With the older intrinsic things are not so very bad because the incoming data is split into 6 segments. But if we use this intrinsic, there is no safepoint check in the inner loop, which can lead to a long time to safepoint, and this causes stalls on the other threads. If you split the incoming data into blocks of about a megabyte you'd lose no measurable performance but you'd dramatically improve the performance of everything else, especially with a concurrent GC. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402
