On Tue, 14 Sep 2021 13:31:19 GMT, Andrew Haley <[email protected]> wrote:
>> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not >> support the new intrinsic. Tests run were crypto.full.AESGCMBench and >> crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the >> AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table >> space should be allocated differently so that non-supporting CPUs do not >> suffer this penalty. This issue also affects non-Intel CPUs too. > > It seems to me there's a serious problem here. When you execute the > galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the > number of blocks to be encrypted. With the older intrinsic things are not so > very bad because the incoming data is split into 6 segments. But if we use > this intrinsic, there is no safepoint check in the inner loop, which can lead > to a long time to safepoint, and this causes stalls on the other threads. > If you split the incoming data into blocks of about a megabyte you'd lose no > measurable performance but you'd dramatically improve the performance of > everything else, especially with a concurrent GC. @theRealAph I have implemented changes as per your suggestions. Could you review the changes and let me know your thoughts? ------------- PR: https://git.openjdk.java.net/jdk/pull/5402
