Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Jason A. Donenfeld
Hi Martin, On Tue, Nov 20, 2018 at 5:29 PM Martin Willi wrote: > Thanks for the offer, no need at this time. But I certainly would > welcome if you could do some (Wireguard) benching with that code to see > if it works for you. I certainly will test it in a few different network circumstances,

[PATCH 3/3] crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This version uses the same principle as the AVX2 version by scheduling the operations for two block pairs in parallel. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of the raw block function of ~20%.

[PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions

2018-11-20 Thread Martin Willi
In the quest for pushing the limits of chacha20 encryption for both IPsec and Wireguard, this small series adds AVX-512VL block functions. The VL variant works on 256-bit ymm registers, but compared to AVX2 can benefit from the new instructions. Compared to the AVX2 version, these block functions

[PATCH 2/3] crypto: x86/chacha20 - Add a 2-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This version uses the same principle as the AVX2 version. It benefits from the AVX-512VL rotate instructions and the more efficient partial block handling using "vmovdqu8", resulting in a speedup of ~20%. Unlike the AVX2 version, it is faster than the single block SSSE3 version to process a

[PATCH 1/3] crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

2018-11-20 Thread Martin Willi
This variant is similar to the AVX2 version, but benefits from the AVX-512 rotate instructions and the additional registers, so it can operate without any data on the stack. It uses ymm registers only to avoid the massive core throttling on Skylake-X platforms. Nontheless does it bring a ~30%

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Martin Willi
Hi Jason, > [...] I have a massive Xeon Gold 5120 machine that I can give you > access to if you'd like to do some testing and benching. Thanks for the offer, no need at this time. But I certainly would welcome if you could do some (Wireguard) benching with that code to see if it works for you.