Re: [PATCH] crypto: x86/chacha-sse3 - use unaligned loads for state array

2020-07-08 Thread Martin Willi
t; algorithm to use aligned loads. > > Given that the performance benefit of using of aligned loads appears > to be limited (~0.25% for 1k blocks using tcrypt on a Corei7-8650U), > and the fact that this hack has leaked into generic ChaCha code, > let's just remove it. Reviewed-by: Martin Willi Thanks, Martin

Re: [v3 PATCH] crypto: chacha - Add DEFINE_CHACHA_STATE macro

2020-07-08 Thread Martin Willi
> > Also, I wonder if we shouldn't simply change the chacha code to use > > unaligned loads for the state array, as it likely makes very little > > difference in practice (the state is not accessed from inside the > > round processing loop) > > I am seeing a 0.25% slowdown on 1k blocks in the SS

Re: [PATCH v3 02/29] crypto: x86/chacha - depend on generic chacha library instead of crypto driver

2019-10-15 Thread Martin Willi
Hi Ard, > Since turning the FPU on and off is cheap these days, simplify the > SIMD routine by dropping the per-page yield, which makes for a > cleaner switch to the library API as well. In my measurements that lazy FPU restore works as intended, and I could not identify any slowdown by this chan

Re: [RFC/RFT PATCH 06/18] crypto: chacha20poly1305 - set cra_name correctly

2019-04-01 Thread Martin Willi
> If the rfc7539 template is instantiated with specific > implementations, e.g. "rfc7539(chacha20-generic,poly1305-generic)" > rather than "rfc7539(chacha20,poly1305)", then the implementation > names end up included in the instance's cra_name. This is i

Re: [RFC/RFT PATCH 01/18] crypto: x86/poly1305 - fix overflow during partial reduction

2019-04-01 Thread Martin Willi
> [...] This bug was originally detected by my patches that improve > testmgr to fuzz algorithms against their generic implementation. Thanks Eric. This shows how valuable your continued work on the crypto testing code is, and how useful such a (common) testing infrastructure can be. Reviewed-by: Martin Willi

Re: [PATCH v2 3/6] crypto: x86/chacha20 - limit the preemption-disabled section

2018-12-02 Thread Martin Willi
> To improve responsiveness, disable preemption for each step of the > walk (which is at most PAGE_SIZE) rather than for the entire > encryption/decryption operation. It seems that it is not that uncommon for IPsec to get small inputs scattered over multiple blocks. Doing FPU context saving for

Re: [PATCH v2 6/6] crypto: x86/chacha - add XChaCha12 support

2018-12-01 Thread Martin Willi
y > Adiantum. > > Signed-off-by: Eric Biggers Reviewed-by: Martin Willi

Re: [PATCH v2 5/6] crypto: x86/chacha20 - refactor to allow varying number of rounds

2018-12-01 Thread Martin Willi
> In preparation for adding XChaCha12 support, rename/refactor the > x86_64 SIMD implementations of ChaCha20 to support different numbers > of rounds. > > Signed-off-by: Eric Biggers Reviewed-by: Martin Willi

Re: [PATCH v2 4/6] crypto: x86/chacha20 - add XChaCha20 support

2018-12-01 Thread Martin Willi
ermute AFAIK, the general convention is to create proper stack frames using FRAME_BEGIN/END for non leaf-functions. Should chacha20_permute() callers do so? For the other parts: Reviewed-by: Martin Willi

[PATCH 3/3] crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

2018-11-20 Thread Martin Willi
ion of ~20%. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx512vl-x86_64.S | 272 + arch/x86/crypto/chacha20_glue.c| 7 + 2 files changed, 279 insertions(+) diff --git a/arch/x86/crypto/chacha20-avx512vl-x86_64.S b/arch/x86/crypto/chacha20-avx512vl-x86_

[PATCH 0/3] crypto: x86/chacha20 - AVX-512VL block functions

2018-11-20 Thread Martin Willi
1453 1947 1496 1477 1963 1438 1930 Martin Willi (3): crypto: x86/chacha20 - Add a 8-block AVX-512VL variant crypto: x86/chacha20 - Add a 2-block AVX-512VL variant crypto: x86/chacha20 - Add a 4-block AVX-512VL variant arch/x86/crypto/Makefile | 5 + arch

[PATCH 2/3] crypto: x86/chacha20 - Add a 2-block AVX-512VL variant

2018-11-20 Thread Martin Willi
process a single block. Hence we engage that function for (partial) single block lengths as well. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx512vl-x86_64.S | 171 + arch/x86/crypto/chacha20_glue.c| 7 + 2 files changed, 178 insertions(+) diff --git

[PATCH 1/3] crypto: x86/chacha20 - Add a 8-block AVX-512VL variant

2018-11-20 Thread Martin Willi
namic masks is not part of the AVX-512VL instruction set, hence we depend on AVX-512BW as well. Given that the major AVX-512VL architectures provide AVX-512BW and this extension does not affect core clocking, this seems to be no problem at least for now. Signed-off-by: Martin Willi

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-20 Thread Martin Willi
Hi Jason, > [...] I have a massive Xeon Gold 5120 machine that I can give you > access to if you'd like to do some testing and benching. Thanks for the offer, no need at this time. But I certainly would welcome if you could do some (Wireguard) benching with that code to see if it works for you.

Re: [PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-18 Thread Martin Willi
Hi Jason, > I'd be inclined to roll with your implementation if it can eventually > become competitive with Andy Polyakov's, [...] I think for the SSSE3/AVX2 code paths it is competitive; especially for small sizes it is faster, which is not that unimportant when implementing layer 3 VPNs. > the

[PATCH 6/6] crypto: x86/chacha20 - Add a 4-block AVX2 variant

2018-11-11 Thread Martin Willi
place. The partial XORing function trailer is very similar to the AVX2 2-block variant. While it could be shared, that code segment is rather short; profiling is also easier with the trailer integrated, so we keep it per function. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx2

[PATCH 3/6] crypto: x86/chacha20 - Support partial lengths in 8-block AVX2 variant

2018-11-11 Thread Martin Willi
. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx2-x86_64.S | 189 + arch/x86/crypto/chacha20_glue.c| 5 +- 2 files changed, 133 insertions(+), 61 deletions(-) diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S b/arch/x86/crypto/chacha20-avx2-x86_64.S

[PATCH 4/6] crypto: x86/chacha20 - Use larger block functions more aggressively

2018-11-11 Thread Martin Willi
Now that all block functions support partial lengths, engage the wider block sizes more aggressively. This prevents using smaller block functions multiple times, where the next larger block function would have been faster. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20_glue.c | 39

[PATCH 1/6] crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant

2018-11-11 Thread Martin Willi
s probably not worth it. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-ssse3-x86_64.S | 74 - arch/x86/crypto/chacha20_glue.c | 11 ++-- 2 files changed, 63 insertions(+), 22 deletions(-) diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S b/arch/

[PATCH 2/6] crypto: x86/chacha20 - Support partial lengths in 4-block SSSE3 variant

2018-11-11 Thread Martin Willi
function. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-ssse3-x86_64.S | 163 ++-- arch/x86/crypto/chacha20_glue.c | 5 +- 2 files changed, 128 insertions(+), 40 deletions(-) diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S b/arch/x86/crypto/chacha20

[PATCH 0/6] crypto: x86/chacha20 - SIMD performance improvements

2018-11-11 Thread Martin Willi
1027 1522 1537 1440 1027 1564 1523 1448 1026 1507 1512 1456 1025 1515 1491 1464 1023 1522 1481 1472 1037 1559 1577 1480 927 1518 1559 1488 926 1514 1548 1496 926 1513 1534 Martin Willi (6): crypto: x86/chacha20 - Support partial lengths in 1-block SSSE3 variant crypto: x86/chacha20

[PATCH 5/6] crypto: x86/chacha20 - Add a 2-block AVX2 variant

2018-11-11 Thread Martin Willi
require a 4-block function. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-avx2-x86_64.S | 197 + arch/x86/crypto/chacha20_glue.c| 7 + 2 files changed, 204 insertions(+) diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S b/arch/x86/crypto/chacha20-avx2

Re: [RFC PATCH v3 00/15] crypto: Adiantum support

2018-11-07 Thread Martin Willi
t; skcipher template. Nice work. I did a quick review only, but you may add my Acked-by: Martin Willi for patches 1-5, 10 and 11. Thanks, Martin

Re: [PATCH net-next v7 26/28] crypto: port ChaCha20 to Zinc

2018-10-06 Thread Martin Willi
Hi Jason, > Now that ChaCha20 is in Zinc, we can have the crypto API code simply > call into it. > delete mode 100644 arch/x86/crypto/chacha20-avx2-x86_64.S > delete mode 100644 arch/x86/crypto/chacha20-ssse3-x86_64.S I did some more testing with that new Zinc ChaCha20 code on x64, and I'm sti

Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc

2018-09-16 Thread Martin Willi
Hi Jason, > Now that ChaCha20 is in Zinc, we can have the crypto API code simply > call into it. > delete mode 100644 arch/x86/crypto/chacha20-avx2-x86_64.S > delete mode 100644 arch/x86/crypto/chacha20-ssse3-x86_64.S I did some trivial benchmarking with tcrypt for the ChaCha20Poly1305 AEAD as

Re: [RFC PATCH] crypto: chacha20 - add implementation using 96-bit nonce

2017-12-10 Thread Martin Willi
Hi, > Anyway, I actually thought it was intentional that the ChaCha > implementations in the Linux kernel allowed specifying the block > counter, and therefore allowed seeking to any point in the keystream, > exposing the full functionality of the cipher. If I remember correctly, it was indeed in

Re: [PATCH v4] poly1305: generic C can be faster on chips with slow unaligned access

2016-11-08 Thread Martin Willi
> By using the unaligned access helpers, we drastically improve > performance on small MIPS routers that have to go through the > exception fix-up handler for these unaligned accesses. I couldn't measure any slowdown here, so: Acked-by: Martin Willi > -   dctx->s[0]

Re: [PATCH] crypto: chacha20_4block_xor_ssse3: Align stack pointer to 64 bytes

2016-01-22 Thread Martin Willi
ersion seems to be ok, so is Poly1305. Acked-by: Martin Willi -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH v2 09/10] crypto: poly1305 - Add a two block SSE2 variant for x86_64

2015-07-16 Thread Martin Willi
. Signed-off-by: Martin Willi --- arch/x86/crypto/poly1305-sse2-x86_64.S | 306 + arch/x86/crypto/poly1305_glue.c| 54 +- 2 files changed, 355 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/poly1305-sse2-x86_64.S b/arch/x86/crypto/poly1305-sse2

[PATCH v2 10/10] crypto: poly1305 - Add a four block AVX2 variant for x86_64

2015-07-16 Thread Martin Willi
): 684405 opers/sec, 2825226316 bytes/sec test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 367101 opers/sec, 3019039446 bytes/sec Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 1 + arch/x86/crypto/poly1305

[PATCH v2 05/10] crypto: chacha20 - Add an eight block AVX2 variant for x86_64

2015-07-16 Thread Martin Willi
operations in 10 seconds (18672197632 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 1 + arch/x86/crypto/chacha20-avx2-x86_64.S | 443 + arch/x86/crypto/chacha20_glue.c| 19 ++ crypto

[PATCH v2 04/10] crypto: chacha20 - Add a four block SSSE3 variant for x86_64

2015-07-16 Thread Martin Willi
10 seconds (11846409216 bytes) test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-ssse3-x86_64.S | 483 arch/x86/crypto

[PATCH v2 08/10] crypto: poly1305 - Add a SSE2 SIMD variant for x86_64

2015-07-16 Thread Martin Willi
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153075 opers/sec, 1258896201 bytes/sec Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 2 + arch/x86/crypto/poly1305-sse2-x86_64.S | 276

[PATCH v2 07/10] crypto: poly1305 - Export common Poly1305 helpers

2015-07-16 Thread Martin Willi
As architecture specific drivers need a software fallback, export Poly1305 init/update/final functions together with some helpers in a header file. Signed-off-by: Martin Willi --- crypto/chacha20poly1305.c | 4 +-- crypto/poly1305_generic.c | 73

[PATCH v2 02/10] crypto: chacha20 - Export common ChaCha20 helpers

2015-07-16 Thread Martin Willi
As architecture specific drivers need a software fallback, export a ChaCha20 en-/decryption function together with some helpers in a header file. Signed-off-by: Martin Willi --- crypto/chacha20_generic.c | 28 crypto/chacha20poly1305.c | 3 +-- include/crypto

[PATCH v2 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

2015-07-16 Thread Martin Willi
for typical IPsec MTUs. On Ivy Bridge using SSE2/SSSE3 the numbers compared to AES-GCM are very similar due to the less efficient CLMUL instructions. Changes in v2: - No code changes - Use sec=10 for more reliable benchmark results Martin Willi (10): crypto: tcrypt - Add ChaCha20/Poly1305 speed

[PATCH v2 06/10] crypto: testmgr - Add a longer ChaCha20 test vector

2015-07-16 Thread Martin Willi
The AVX2 variant of ChaCha20 is used only for messages with >= 512 bytes length. With the existing test vectors, the implementation could not be tested. Due that lack of such a long official test vector, this one is self-generated using chacha20-generic. Signed-off-by: Martin Willi --- cry

[PATCH v2 01/10] crypto: tcrypt - Add ChaCha20/Poly1305 speed tests

2015-07-16 Thread Martin Willi
Adds individual ChaCha20 and Poly1305 and a combined rfc7539esp AEAD speed test using mode numbers 214, 321 and 213. For Poly1305 we add a specific speed template, as it expects the key prepended to the input data. Signed-off-by: Martin Willi --- crypto/tcrypt.c | 15 +++ crypto

[PATCH v2 03/10] crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64

2015-07-16 Thread Martin Willi
): 5360533 operations in 10 seconds (5489185792 bytes) test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile| 2 + arch/x86/crypto/chacha20

Re: crypto: chacha20poly1305 - Convert to new AEAD interface

2015-07-16 Thread Martin Willi
ad, so you may add my: Tested-by: Martin Willi Regards Martin -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

2015-07-11 Thread Martin Willi
> If you're going to use sec you need to use at least 10 in order > for it to be meaningful as shorter values often result in bogus > numbers. Ok, I'll use sec=10 in v2. There is no fundamental difference compared to sec=1 (except for very short blocks): testing speed of rfc7539esp(chacha20,poly

Re: [PATCH 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

2015-07-08 Thread Martin Willi
Herbert, > Running the speed test with sec=1 makes no sense because it's > too short. Please use sec=0 and count cycles instead. I get less constant numbers between different runs when using sec=0, hence I've used sec=1. Below are the numbers of "average" runs for the AEAD measuring cycles; I'll

[PATCH 01/10] crypto: tcrypt - Add ChaCha20/Poly1305 speed tests

2015-07-07 Thread Martin Willi
Adds individual ChaCha20 and Poly1305 and a combined rfc7539esp AEAD speed test using mode numbers 214, 321 and 213. For Poly1305 we add a specific speed template, as it expects the key prepended to the input data. Signed-off-by: Martin Willi --- crypto/tcrypt.c | 15 +++ crypto

[PATCH 02/10] crypto: chacha20 - Export common ChaCha20 helpers

2015-07-07 Thread Martin Willi
As architecture specific drivers need a software fallback, export a ChaCha20 en-/decryption function together with some helpers in a header file. Signed-off-by: Martin Willi --- crypto/chacha20_generic.c | 28 crypto/chacha20poly1305.c | 3 +-- include/crypto

[PATCH 08/10] crypto: poly1305 - Add a SSE2 SIMD variant for x86_64

2015-07-07 Thread Martin Willi
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153136 opers/sec, 1259390464 bytes/sec Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 2 + arch/x86/crypto/poly1305-sse2-x86_64.S | 276

[PATCH 00/10] crypto: x86_64 - Add SSE/AVX2 ChaCha20/Poly1305 ciphers

2015-07-07 Thread Martin Willi
CLMUL instructions. Martin Willi (10): crypto: tcrypt - Add ChaCha20/Poly1305 speed tests crypto: chacha20 - Export common ChaCha20 helpers crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64 crypto: chacha20 - Add a four block SSSE3 variant for x86_64 crypto: chacha20 - Add an eight

[PATCH 03/10] crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64

2015-07-07 Thread Martin Willi
1 seconds (532198400 bytes) test 4 (256 bit key, 8192 byte blocks): 67132 operations in 1 seconds (549945344 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile| 2 + arch/x86/crypto/chacha20-ssse3-x86_64.S | 142

[PATCH 09/10] crypto: poly1305 - Add a two block SSE2 variant for x86_64

2015-07-07 Thread Martin Willi
. Signed-off-by: Martin Willi --- arch/x86/crypto/poly1305-sse2-x86_64.S | 306 + arch/x86/crypto/poly1305_glue.c| 54 +- 2 files changed, 355 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/poly1305-sse2-x86_64.S b/arch/x86/crypto/poly1305-sse2

[PATCH 04/10] crypto: chacha20 - Add a four block SSSE3 variant for x86_64

2015-07-07 Thread Martin Willi
bytes) test 4 (256 bit key, 8192 byte blocks): 140107 operations in 1 seconds (1147756544 bytes) Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/chacha20-ssse3-x86_64.S | 483 arch/x86/crypto/chacha20_glue.c | 8

[PATCH 10/10] crypto: poly1305 - Add a four block AVX2 variant for x86_64

2015-07-07 Thread Martin Willi
): 677578 opers/sec, 2797041984 bytes/sec test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 364094 opers/sec, 2994309056 bytes/sec Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 1 + arch/x86/crypto/poly1305

[PATCH 06/10] crypto: testmgr - Add a longer ChaCha20 test vector

2015-07-07 Thread Martin Willi
The AVX2 variant of ChaCha20 is used only for messages with >= 512 bytes length. With the existing test vectors, the implementation could not be tested. Due that lack of such a long official test vector, this one is self-generated using chacha20-generic. Signed-off-by: Martin Willi --- cry

[PATCH 07/10] crypto: poly1305 - Export common Poly1305 helpers

2015-07-07 Thread Martin Willi
As architecture specific drivers need a software fallback, export Poly1305 init/update/final functions together with some helpers in a header file. Signed-off-by: Martin Willi --- crypto/chacha20poly1305.c | 4 +-- crypto/poly1305_generic.c | 73

[PATCH 05/10] crypto: chacha20 - Add an eight block AVX2 variant for x86_64

2015-07-07 Thread Martin Willi
bytes) Benchmark results from a Core i5-4670T. Signed-off-by: Martin Willi --- arch/x86/crypto/Makefile | 1 + arch/x86/crypto/chacha20-avx2-x86_64.S | 443 + arch/x86/crypto/chacha20_glue.c| 19 ++ crypto/Kconfig

[PATCH] crypto: poly1305 - Pass key as first two message blocks to each desc_ctx

2015-06-16 Thread Martin Willi
The Poly1305 authenticator requires a unique key for each generated tag. This implies that we can't set the key per tfm, as multiple users set individual keys. Instead we pass a desc specific key as the first two blocks of the message to authenticate in update(). Signed-off-by: Martin

Re: [PATCH 3/9] crypto: Add a generic Poly1305 authenticator implementation

2015-06-04 Thread Martin Willi
Herbert, > I just realised that this doesn't quite work. The key is shared > by all users of the tfm, yet in your case you need it to be local I agree, as Poly1305 uses a different key for each tag the current approach doesn't work. > I think the simplest solution is to make the key the beginni

[PATCH 1/9] crypto: Add a generic ChaCha20 stream cipher implementation

2015-06-01 Thread Martin Willi
. It uses a 16-byte IV, which includes the 12-byte ChaCha20 nonce prepended by the initial block counter. Some algorithms require an explicit counter value, for example the mentioned AEAD construction. Signed-off-by: Martin Willi --- crypto/Kconfig| 13 +++ crypto/Makefile

[PATCH 2/9] crypto: testmgr - Add ChaCha20 test vectors from RFC7539

2015-06-01 Thread Martin Willi
We explicitly set the Initial block Counter by prepending it to the nonce in Little Endian. The same test vector is used for both encryption and decryption, ChaCha20 is a cipher XORing a keystream. Signed-off-by: Martin Willi --- crypto/testmgr.c | 15 + crypto/testmgr.h | 177

[PATCH 9/9] xfrm: Define ChaCha20-Poly1305 AEAD XFRM algo for IPsec users

2015-06-01 Thread Martin Willi
Signed-off-by: Martin Willi --- net/xfrm/xfrm_algo.c | 12 1 file changed, 12 insertions(+) diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index 67266b7..42f7c76 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -159,6 +159,18 @@ static struct xfrm_algo_desc

[PATCH 4/9] crypto: testmgr - Add Poly1305 test vectors from RFC7539

2015-06-01 Thread Martin Willi
Signed-off-by: Martin Willi --- crypto/testmgr.c | 9 ++ crypto/testmgr.h | 259 +++ 2 files changed, 268 insertions(+) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index abd09c2..faf93a6 100644 --- a/crypto/testmgr.c +++ b/crypto

[PATCH 8/9] crypto: testmgr - Add draft-ietf-ipsecme-chacha20-poly1305 test vector

2015-06-01 Thread Martin Willi
Signed-off-by: Martin Willi --- crypto/testmgr.c | 15 + crypto/testmgr.h | 179 +++ 2 files changed, 194 insertions(+) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 915a9ef..ccd19cf 100644 --- a/crypto/testmgr.c +++ b/crypto

[PATCH 7/9] crypto: chacha20poly1305 - Add an IPsec variant for RFC7539 AEAD

2015-06-01 Thread Martin Willi
draft-ietf-ipsecme-chacha20-poly1305 defines the use of ChaCha20/Poly1305 in ESP. It uses additional four byte key material as a salt, which is then used with an 8 byte IV to form the ChaCha20 nonce as defined in the RFC7539. Signed-off-by: Martin Willi --- crypto/chacha20poly1305.c | 26

[PATCH 6/9] crypto: testmgr - Add ChaCha20-Poly1305 test vectors from RFC7539

2015-06-01 Thread Martin Willi
Signed-off-by: Martin Willi --- crypto/testmgr.c | 15 crypto/testmgr.h | 269 +++ 2 files changed, 284 insertions(+) diff --git a/crypto/testmgr.c b/crypto/testmgr.c index faf93a6..915a9ef 100644 --- a/crypto/testmgr.c +++ b/crypto

[PATCH 5/9] crypto: Add a ChaCha20-Poly1305 AEAD construction, RFC7539

2015-06-01 Thread Martin Willi
This AEAD uses a chacha20 ablkcipher and a poly1305 ahash to construct the ChaCha20-Poly1305 AEAD as defined in RFC7539. It supports both synchronous and asynchronous operations, even if we currently have no async chacha20 or poly1305 drivers. Signed-off-by: Martin Willi --- crypto/Kconfig

[PATCH 3/9] crypto: Add a generic Poly1305 authenticator implementation

2015-06-01 Thread Martin Willi
public domain code by Daniel J. Bernstein and Andrew Moon. Signed-off-by: Martin Willi --- crypto/Kconfig| 9 ++ crypto/Makefile | 1 + crypto/poly1305_generic.c | 300 ++ 3 files changed, 310 insertions(+) create mode 100644

[PATCH 0/9] crypto: Add ChaCha20-Poly1305 AEAD support for IPsec

2015-06-01 Thread Martin Willi
test setup the IPsec throughput is ~700Mbits/s with these portable drivers. Architecture specific drivers subject to a future patchset can improve performance, for example with SSE doubling performance is feasible. Martin Willi (9): crypto: Add a generic ChaCha20 stream cipher implementation

Re: CCM/GCM implementation defect

2015-04-23 Thread Martin Willi
Hi Steffen, > > It looks like our IPsec implementations of CCM and GCM are buggy > > in that they don't include the IV in the authentication calculation. > > Seems like crypto_rfc4106_crypt() passes the associated data it > got from ESP directly to gcm, without chaining with the IV. Do you have

Re: CCM/GCM implementation defect

2015-04-23 Thread Martin Willi
Hi Herbert, > > Does this mean that even the test vectors (crypto/testmgr.h) are broken? > > Indeed. The test vectors appear to be generated either through > our implementation or by one that is identical to us. I'm not sure about that. RFC4106 refers to [1] for test vectors, which is still ava

[PATCH 2/3] xfrm: Traffic Flow Confidentiality for IPv4 ESP

2010-12-08 Thread Martin Willi
Add TFC padding to all packets smaller than the boundary configured on the xfrm state. If the boundary is larger than the PMTU, limit padding to the PMTU. Signed-off-by: Martin Willi --- net/ipv4/esp4.c | 32 1 files changed, 24 insertions(+), 8 deletions

[PATCH 0/3] xfrm: ESP Traffic Flow Confidentiality padding (v3)

2010-12-08 Thread Martin Willi
. Changes from v2: - Remove unused flag field in attribute, use a plain u32 as attribute payload - Reject installation of TFC padding on non-tunnel SAs Martin Willi (3): xfrm: Add Traffic Flow Confidentiality padding XFRM attribute xfrm: Traffic Flow Confidentiality for IPv4 ESP

[PATCH 1/3] xfrm: Add Traffic Flow Confidentiality padding XFRM attribute

2010-12-08 Thread Martin Willi
The XFRMA_TFCPAD attribute for XFRM state installation configures Traffic Flow Confidentiality by padding ESP packets to a specified length. Signed-off-by: Martin Willi --- include/linux/xfrm.h |1 + include/net/xfrm.h |1 + net/xfrm/xfrm_user.c | 19 +-- 3 files

[PATCH 3/3] xfrm: Traffic Flow Confidentiality for IPv6 ESP

2010-12-08 Thread Martin Willi
Add TFC padding to all packets smaller than the boundary configured on the xfrm state. If the boundary is larger than the PMTU, limit padding to the PMTU. Signed-off-by: Martin Willi --- net/ipv6/esp6.c | 32 1 files changed, 24 insertions(+), 8 deletions

Re: [PATCH 2/3] xfrm: Traffic Flow Confidentiality for IPv4 ESP

2010-12-08 Thread Martin Willi
> In particular, why would we need a boundary at all? Setting it to > anything other than the PMTU would seem to defeat the purpose of > TFC for packets between the boundary and the PMTU. I don't agree, this highly depends on the traffic on the SA. For a general purpose tunnel with TCP flows, PMT

[PATCH 0/3] xfrm: ESP Traffic Flow Confidentiality padding (v2)

2010-12-07 Thread Martin Willi
e kept the currently unused flags in the XFRM attribute to implement ESPv2 fallback or other extensions in the future without changing the ABI. Martin Willi (3): xfrm: Add Traffic Flow Confidentiality padding XFRM attribute xfrm: Traffic Flow Confidentiality for IPv4 ESP xfrm: Tr

[PATCH 1/3] xfrm: Add Traffic Flow Confidentiality padding XFRM attribute

2010-12-07 Thread Martin Willi
The XFRMA_TFC attribute for XFRM state installation configures Traffic Flow Confidentiality by padding ESP packets to a specified length. Signed-off-by: Martin Willi --- include/linux/xfrm.h |6 ++ include/net/xfrm.h |1 + net/xfrm/xfrm_user.c | 16 ++-- 3 files

[PATCH 2/3] xfrm: Traffic Flow Confidentiality for IPv4 ESP

2010-12-07 Thread Martin Willi
Add TFC padding to all packets smaller than the boundary configured on the xfrm state. If the boundary is larger than the PMTU, limit padding to the PMTU. Signed-off-by: Martin Willi --- net/ipv4/esp4.c | 33 + 1 files changed, 25 insertions(+), 8 deletions

[PATCH 3/3] xfrm: Traffic Flow Confidentiality for IPv6 ESP

2010-12-07 Thread Martin Willi
Add TFC padding to all packets smaller than the boundary configured on the xfrm state. If the boundary is larger than the PMTU, limit padding to the PMTU. Signed-off-by: Martin Willi --- net/ipv6/esp6.c | 33 + 1 files changed, 25 insertions(+), 8 deletions

Re: [PATCH 3/5] xfrm: Traffic Flow Confidentiality for IPv4 ESP

2010-12-06 Thread Martin Willi
Hi Herbert, > I know why you want to do this, what I'm asking is do you have any > research behind this with regards to security > > Has this scheme been discussed on a public forum somewhere? No, sorry, I haven't found much valuable discussion about TFC padding. Nothing at all how to overcome

Re: [PATCH 3/5] xfrm: Traffic Flow Confidentiality for IPv4 ESP

2010-12-03 Thread Martin Willi
> What is the basis of this random length padding? Let assume a peer does not support ESPv3 padding, but we have to pad a small packet with more than 255 bytes. We can't, the ESP padding length field is limited to 255. We could add 255 fixed bytes, but an eavesdropper could just subtract the 255

[PATCH 0/5] xfrm: ESP Traffic Flow Confidentiality padding

2010-11-30 Thread Martin Willi
, but I'm not sure if my PMTU lookup works in all cases (nested transforms?). Any pointer would be appreciated. Martin Willi (5): xfrm: Add Traffic Flow Confidentiality padding XFRM attribute xfrm: Remove unused ESP padlen field xfrm: Traffic Flow Confidentiality for IPv4 ESP

[PATCH 5/5] xfrm: Add TFC padding option to automatically pad to PMTU

2010-11-30 Thread Martin Willi
Traffic Flow Confidentiality padding is most effective if all packets have exactly the same size. For SAs with mixed traffic, the largest packet size is usually the PMTU. Instead of calculating the PMTU manually, the XFRM_TFC_PMTU flag automatically pads to the PMTU. Signed-off-by: Martin Willi

[PATCH 1/5] xfrm: Add Traffic Flow Confidentiality padding XFRM attribute

2010-11-30 Thread Martin Willi
The XFRMA_TFCPAD attribute for XFRM state installation configures Traffic Flow Confidentiality by padding ESP packets to a specified length. To use RFC4303 TFC padding and overcome the 255 byte ESP padding field limit, the XFRM_TFC_ESPV3 flag must be set. Signed-off-by: Martin Willi --- include

[PATCH 2/5] xfrm: Remove unused ESP padlen field

2010-11-30 Thread Martin Willi
The padlen field in IPv4/6 ESP is used to align the ESP padding length to a value larger than the aead block size. There is however no option to set this field, hence it is removed. Signed-off-by: Martin Willi --- include/net/esp.h |3 --- net/ipv4/esp4.c | 11 ++- net/ipv6/esp6

[PATCH 4/5] xfrm: Traffic Flow Confidentiality for IPv6 ESP

2010-11-30 Thread Martin Willi
If configured on xfrm state, increase the length of all packets to a given boundary using TFC padding as specified in RFC4303. For transport mode, or if the XFRM_TFC_ESPV3 is not set, grow the ESP padding field instead. Signed-off-by: Martin Willi --- net/ipv6/esp6.c | 42

[PATCH 3/5] xfrm: Traffic Flow Confidentiality for IPv4 ESP

2010-11-30 Thread Martin Willi
If configured on xfrm state, increase the length of all packets to a given boundary using TFC padding as specified in RFC4303. For transport mode, or if the XFRM_TFC_ESPV3 is not set, grow the ESP padding field instead. Signed-off-by: Martin Willi --- net/ipv4/esp4.c | 42

Re: [PATCH 3/4] crypto: algif_hash - User-space interface for hash operations

2010-11-15 Thread Martin Willi
> This patch adds the af_alg plugin for hash, corresponding to > the ahash kernel operation type. Tested-by: Martin Willi -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordo

Re: [PATCH 2/4] crypto: af_alg - User-space interface for Crypto API

2010-11-15 Thread Martin Willi
> This patch creates the backbone of the user-space interface for > the Crypto API, through a new socket family AF_ALG. Tested-by: Martin Willi -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org Mor

Re: [PATCH 4/4] crypto: algif_skcipher - User-space interface for skcipher operations

2010-11-15 Thread Martin Willi
> This patch adds the af_alg plugin for symmetric key ciphers, > corresponding to the ablkcipher kernel operation type. I can confirm that the newest patch fixes the page leak. Tested-by: Martin Willi -- To unsubscribe from this list: send the line "unsubscribe linux-crypto"

Re: [PATCH 4/4] crypto: algif_skcipher - User-space interface for skcipher operations

2010-11-08 Thread Martin Willi
> Hmm, can you show me your test program and how you determined > that it was leaking pages? The test program below runs 1000 encryptions: # grep nr_free /proc/vmstat nr_free_pages 11031 # ./test ... # grep nr_free /proc/vmstat nr_free_pages 10026 # ./test ... # grep nr_free /proc/vmstat nr_f

Re: [PATCH 4/4] crypto: algif_skcipher - User-space interface for skcipher operations

2010-11-06 Thread Martin Willi
Hi Herbert, I did a proof-of-concept implementation for our crypto library, the interface looks good so far. All our hash, hmac, xcbc and cipher test vectors matched. > + sg_assign_page(sg + i, alloc_page(GFP_KERNEL)); Every skcipher operation leaks memory on my box (this pag

[PATCH] xfrm: Fix truncation length of authentication algorithms installed via PF_KEY

2009-12-09 Thread Martin Willi
-off-by: Martin Willi --- net/key/af_key.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/net/key/af_key.c b/net/key/af_key.c index 84209fb..76fa6fe 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1193,6 +1193,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state

[PATCH] xfrm: Add SHA384 and SHA512 HMAC authentication algorithms to XFRM

2009-11-25 Thread Martin Willi
These algorithms use a truncation of 192/256 bits, as specified in RFC4868. Signed-off-by: Martin Willi --- net/xfrm/xfrm_algo.c | 34 ++ 1 files changed, 34 insertions(+), 0 deletions(-) diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index faf54c6

[PATCH 1/3] xfrm: Define new XFRM netlink auth attribute with specified truncation bits

2009-11-25 Thread Martin Willi
The new XFRMA_ALG_AUTH_TRUNC attribute taking a xfrm_algo_auth as argument allows the installation of authentication algorithms with a truncation length specified in userspace, i.e. SHA256 with 128 bit instead of 96 bit truncation. Signed-off-by: Martin Willi --- include/linux/xfrm.h |8

[PATCH 2/3] xfrm: Store aalg in xfrm_state with a user specified truncation length

2009-11-25 Thread Martin Willi
specified, or the authentication algorithm is specified using xfrm_algo, the truncation length from the algorithm description in the kernel is used. Signed-off-by: Martin Willi --- include/net/xfrm.h| 12 - net/xfrm/xfrm_state.c |2 +- net/xfrm/xfrm_user.c | 129

[PATCH 0/3] xfrm: Custom truncation lengths for authentication algorithms

2009-11-25 Thread Martin Willi
The following patchset adds support for defining truncation lengths for authentication algorithms in userspace. The main purpose for this is to support SHA256 in IPsec using the standardized 128 bit instead of the currently used 96 bit truncation. Martin Willi (3): xfrm: Define new XFRM netlink

[PATCH 3/3] xfrm: Use the user specified truncation length in ESP and AH

2009-11-25 Thread Martin Willi
Instead of using the hardcoded truncation for authentication algorithms, use the truncation length specified on xfrm_state. Signed-off-by: Martin Willi --- net/ipv4/ah4.c |2 +- net/ipv4/esp4.c |2 +- net/ipv6/ah6.c |2 +- net/ipv6/esp6.c |2 +- 4 files changed, 4 insertions

Re: HMAC regression

2009-05-31 Thread Martin Willi
> You must getting an sg entry that crosses a page boundary, rather than > two sg entries that both stay within a page. Yes. > These things are very rare, and usually occurs as > a result of SLAB debugging causing kmalloc to return memory that > crosses page boundaries. Indeed, SLAB_DEBUG was en

Re: HMAC regression

2009-05-29 Thread Martin Willi
> > Switching the hash implementations to the new shash API introduced a > > regression. HMACs are created incorrectly if the data is scattered over > > multiple pages, resulting in very unreliable IPsec tunnels. > > What are the symptoms? After doing further tests, it seems that this is addition

HMAC regression

2009-05-28 Thread Martin Willi
Hi, Switching the hash implementations to the new shash API introduced a regression. HMACs are created incorrectly if the data is scattered over multiple pages, resulting in very unreliable IPsec tunnels. The appended patch adds a silly hmac(sha1) test vector larger than a 4KB page and fails on c