Hi,
It seems that the cryptodev-2.6 tree at
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
has somehow rolled back 3 months ago.
Not sure if it's a git.kernel.org issue or something else but probably
worth taking a look?
Thanks,
Gilad
--
Gilad Ben-Yossef
Chief
On Sat, 2018-11-10 at 15:51 +0100, Stefan Wahren wrote:
> Adopt the SPDX license identifier headers to ease license compliance
> management. While we are at this fix the comment style, too.
>
> Cc: Lubomir Rintel
> Signed-off-by: Stefan Wahren
> ---
> drivers/char/hw_random/bcm2835-rng.c | 7
This variant builds upon the idea of the 2-block AVX2 variant that
shuffles words after each round. The shuffling has a rather high latency,
so the arithmetic units are not optimally used.
Given that we have plenty of registers in AVX, this version parallelizes
the 2-block variant to do four
Add a length argument to the eight block function for AVX2, so the
block function may XOR only a partial length of eight blocks.
To avoid unnecessary operations, we integrate XORing of the first four
blocks in the final lane interleaving; this also avoids some work in
the partial lengths path.
Now that all block functions support partial lengths, engage the wider
block sizes more aggressively. This prevents using smaller block
functions multiple times, where the next larger block function would
have been faster.
Signed-off-by: Martin Willi
---
arch/x86/crypto/chacha20_glue.c | 39
Add a length argument to the single block function for SSSE3, so the
block function may XOR only a partial length of the full block. Given
that the setup code is rather cheap, the function does not process more
than one block; this allows us to keep the block function selection in
the C glue code.
Add a length argument to the quad block function for SSSE3, so the
block function may XOR only a partial length of four blocks.
As we already have the stack set up, the partial XORing does not need
to. This gives a slightly different function trailer, so we keep that
separate from the 1-block
This patchset improves performance of the ChaCha20 SIMD implementations
for x86_64. For some specific encryption lengths, performance is more
than doubled. Two mechanisms are used to achieve this:
* Instead of calculating the minimal number of required blocks for a
given encryption length,
This variant uses the same principle as the single block SSSE3 variant
by shuffling the state matrix after each round. With the wider AVX
registers, we can do two blocks in parallel, though.
This function can increase performance and efficiency significantly for
lengths that would otherwise