Thanks a lot for the tests, that's very appreciated.

I ran that on my laptop (11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz)
which quite surprisingly has all these CPU features. Mostly idle,
dynamic CPU governor but no thermal throttling at all (and if there
were, it would probably slow down the AVX-512 code anyway), and tests
are long enough for CPU governors to not matter much.

* AES-128-GCM | AES-256-GCM
 - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer 
platform. This should be the most performant flow.
AES-128-GCM     855360.29k  3158479.88k  6093932.91k  8905067.37k 13336828.91k 
13788498.58k

 - Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX 
AESNI flow and should have equivalent performance
AES-128-GCM     785422.85k  1936140.78k  4404423.77k  6481577.18k  7732716.48k  
7873213.39k
AES-128-GCM     790775.41k  1942054.64k  4404868.20k  6484287.87k  7711803.10k  
7778795.52k

 - AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM     150183.11k   167807.25k   598198.71k   662922.19k   681574.40k  
 678182.91k

* RSA 2K/3K/4K Sign Performance
 - Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on 
ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000246s 0.000015s   4057.2  65278.3
rsa 3072 bits 0.000701s 0.000032s   1426.4  31247.7
rsa 4096 bits 0.001434s 0.000055s    697.4  18052.7

 - Individual AVX512F, AVX512VL, and AVX512IFMA features should yield 
equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000523s 0.000015s   1910.4  65748.2
rsa 3072 bits 0.001579s 0.000032s    633.3  31158.1
rsa 4096 bits 0.003529s 0.000055s    283.4  18093.6

rsa 2048 bits 0.000524s 0.000015s   1909.0  66310.8
rsa 3072 bits 0.001577s 0.000032s    634.1  31309.7
rsa 4096 bits 0.003568s 0.000055s    280.2  18120.4

rsa 2048 bits 0.000523s 0.000015s   1913.3  65234.3
rsa 3072 bits 0.001583s 0.000032s    631.7  31094.6
rsa 4096 bits 0.003607s 0.000055s    277.3  18076.8

rsa 2048 bits 0.000524s 0.000015s   1907.6  66299.6
rsa 3072 bits 0.001577s 0.000032s    634.1  31214.4
rsa 4096 bits 0.003586s 0.000055s    278.9  18096.1

We see the expected behavior (AFAIU, all features must be available at
the same time for the changes to have effect).

I'm not comparing everything number by number because I don't think
we're looking for specific percentages of improvements.

Overall we see up to ~2.4 performance improvement and we always see
large improvements (double digit percentages).


As a control I also ran that on lunar, therefore without the patches (I
acknowledge this is not the same openssl version and there are also
other changes but I do not think this matters here).

# AES-128-GCM | AES-256-GCM
 - Baseline - Requires VAES and VPCMULQDQ features present on ICX or newer 
platform. This should be the most performant flow.
AES-128-GCM     782474.44k  1938211.66k  4430867.84k  6402298.54k  7685819.33k  
7840186.37k

 - Individual VAES Disabled and VPCLMULQDQ Disabled should fallback to AVX 
AESNI flow and should have equivalent performance
AES-128-GCM     750028.44k  1926234.78k  4365867.67k  6383893.16k  7742842.78k  
7843146.41k
AES-128-GCM     786910.34k  1934779.33k  4421411.45k  6389114.88k  7650086.87k  
7797479.86k

 - AESNI and VAESNI Disabled should fallback to 'C code' performance
AES-128-GCM     147889.72k   167843.85k   599710.04k   663642.45k   679072.96k  
 680631.91k

# RSA 2K/3K/4K Sign Performance
 - Baseline - Requires AVX512F, AVX512VL, AVX512DQ, and AVX512IFMA features on 
ICX or newer platform. This should be the most performant flow.
rsa 2048 bits 0.000247s 0.000015s   4050.8  66072.6
rsa 3072 bits 0.001596s 0.000032s    626.5  31144.2
rsa 4096 bits 0.003534s 0.000056s    282.9  18003.6

 - Individual AVX512F, AVX512VL, and AVX512IFMA features should yield 
equivalent performance. This flow will use the ADOX/ADCX/MULX RSA flow.
rsa 2048 bits 0.000528s 0.000015s   1892.3  66008.3
rsa 3072 bits 0.001573s 0.000032s    635.6  31094.2
rsa 4096 bits 0.003534s 0.000055s    282.9  18073.8

rsa 2048 bits 0.000522s 0.000015s   1914.7  65763.4
rsa 3072 bits 0.001575s 0.000032s    635.0  31237.8
rsa 4096 bits 0.003530s 0.000055s    283.2  18093.1

rsa 2048 bits 0.000522s 0.000015s   1917.4  65826.2
rsa 3072 bits 0.001575s 0.000032s    635.0  31177.2
rsa 4096 bits 0.003549s 0.000055s    281.8  18109.9

rsa 2048 bits 0.000522s 0.000015s   1915.1  65760.4
rsa 3072 bits 0.001575s 0.000032s    635.0  31180.2
rsa 4096 bits 0.003538s 0.000055s    282.6  18109.9


We can see there are no change with the CPU feature flags, except for the test 
that disables AESNI, in which case the performance is the same in lunar and 
mantic. That the CPU feature flags don't change the performance except i the 
one aforementioned case, indicate that these patches are responsible for the 
large performance increase we have seen. We can also see that they don't 
otherwise degrade performance on this machine.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to openssl in Ubuntu.
https://bugs.launchpad.net/bugs/2030784

Title:
  Backport Intel's AVX512 patches on openssl 3.0

Status in openssl package in Ubuntu:
  Fix Released

Bug description:
  https://github.com/openssl/openssl/pull/14908

  https://github.com/openssl/openssl/pull/17239

  These should provide a nice performance bonus on recent CPUs, and the
  patches are fairly self-contained.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/2030784/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to