Something wrong with cryptodev-2.6 tree?

2018-11-11 Thread Gilad Ben-Yossef
Hi, It seems that the cryptodev-2.6 tree at https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git has somehow rolled back 3 months ago. Not sure if it's a git.kernel.org issue or something else but probably worth taking a look? Thanks, Gilad -- Gilad Ben-Yossef Chief

Re: [PATCH 03/17] hw_random: bcm2835-rng: Switch to SPDX identifier

2018-11-11 Thread Lubomir Rintel
On Sat, 2018-11-10 at 15:51 +0100, Stefan Wahren wrote: > Adopt the SPDX license identifier headers to ease license compliance > management. While we are at this fix the comment style, too. > > Cc: Lubomir Rintel > Signed-off-by: Stefan Wahren > --- > drivers/char/hw_random/bcm2835-rng.c | 7

[PATCH 6/6] crypto: x86/chacha20 - Add a 4-block AVX2 variant

2018-11-11 Thread Martin Willi
This variant builds upon the idea of the 2-block AVX2 variant that shuffles words after each round. The shuffling has a rather high latency, so the arithmetic units are not optimally used. Given that we have plenty of registers in AVX, this version parallelizes the 2-block variant to do four

[PATCH 3/6] crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant

2018-11-11 Thread Martin Willi
Add a length argument to the eight block function for AVX2, so the block function may XOR only a partial length of eight blocks. To avoid unnecessary operations, we integrate XORing of the first four blocks in the final lane interleaving; this also avoids some work in the partial lengths path.

[PATCH 4/6] crypto: x86/chacha20 - Use larger block functions more aggressively

2018-11-11 Thread Martin Willi
Now that all block functions support partial lengths, engage the wider block sizes more aggressively. This prevents using smaller block functions multiple times, where the next larger block function would have been faster. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20_glue.c | 39

[PATCH 1/6] crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant

2018-11-11 Thread Martin Willi
Add a length argument to the single block function for SSSE3, so the block function may XOR only a partial length of the full block. Given that the setup code is rather cheap, the function does not process more than one block; this allows us to keep the block function selection in the C glue code.

[PATCH 2/6] crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant

2018-11-11 Thread Martin Willi
Add a length argument to the quad block function for SSSE3, so the block function may XOR only a partial length of four blocks. As we already have the stack set up, the partial XORing does not need to. This gives a slightly different function trailer, so we keep that separate from the 1-block

[PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-11 Thread Martin Willi
This patchset improves performance of the ChaCha20 SIMD implementations for x86_64. For some specific encryption lengths, performance is more than doubled. Two mechanisms are used to achieve this: * Instead of calculating the minimal number of required blocks for a given encryption length,

[PATCH 5/6] crypto: x86/chacha20 - Add a 2-block AVX2 variant

2018-11-11 Thread Martin Willi
This variant uses the same principle as the single block SSSE3 variant by shuffling the state matrix after each round. With the wider AVX registers, we can do two blocks in parallel, though. This function can increase performance and efficiency significantly for lengths that would otherwise