On 27.12.2020 21.05, Linus Torvalds wrote:
On Sun, Dec 27, 2020 at 10:39 AM Jussi Kivilinna wrote:
5.10.3 with patch compiles fine, but does not solve the issue.
Duh. adding the read_iter only fixes kernel_read(). For splice, it also needs a
.splice_read = generic_file_splice_read
On 27.12.2020 19.20, Linus Torvalds wrote:
On Sun, Dec 27, 2020 at 8:32 AM Jussi Kivilinna wrote:
Has this been fixed in 5.11-rc? Is there any patch that I could backport and
test with 5.10?
Here's a patch to test. Entirely untested by me. I'm surprised at how
people use sendfile
Hello,
Now that 5.9 series is EOL, I tried to move to 5.10.3. I ran in to regression
where LXC containers do not start with newer kernel. I found that issue had
been reported (bisected + with reduced test case) in bugzilla at:
https://bugzilla.kernel.org/show_bug.cgi?id=209971
Has this been
Hello,
On 12.8.2019 20.14, Nathan Chancellor wrote:
> On Mon, Aug 12, 2019 at 10:35:53AM +0300, Jussi Kivilinna wrote:
>> Hello,
>>
>> On 12.8.2019 6.31, Nathan Chancellor wrote:
>>> From: Vladimir Serbinenko
>>>
>>> clang doesn't recognise =l
t; Link:
> https://github.com/gpg/libgcrypt/commit/1ecbd0bca31d462719a2a6590c1d03244e76ef89
> Signed-off-by: Vladimir Serbinenko
> [jk: add changelog, rebase on libgcrypt repository, reformat changed
> line so it does not go over 80 characters]
> Signed-off-by: Jussi Kivilinna
Hello,
07.12.2016, 14:43, Longpeng (Mike) kirjoitti:
> Hi Jussi and Herbert,
>
> I saw serveral des3-ede testcases(in crypto/testmgr.h) has 16-bytes IV, and
> the
> libgcrypt/nettle/RFC1851 said the IV-len is 8-bytes.
>
> Would you please tell me why these testcases has 16-bytes IV ?
Because
Hello,
07.12.2016, 14:43, Longpeng (Mike) kirjoitti:
> Hi Jussi and Herbert,
>
> I saw serveral des3-ede testcases(in crypto/testmgr.h) has 16-bytes IV, and
> the
> libgcrypt/nettle/RFC1851 said the IV-len is 8-bytes.
>
> Would you please tell me why these testcases has 16-bytes IV ?
Because
On 2014-08-20 21:14, Milan Broz wrote:
> On 08/20/2014 03:25 PM, Jussi Kivilinna wrote:
>>> One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow
>>> that
>>> does not sound right.
>>
>> Agreed, those do not look correct... I wonder
On 2014-08-20 21:14, Milan Broz wrote:
On 08/20/2014 03:25 PM, Jussi Kivilinna wrote:
One to four GB per second for XTS? 12 GB per second for AES CBC? Somehow
that
does not sound right.
Agreed, those do not look correct... I wonder what happened there. On
new run, I got more sane results
Hello,
On 2014-08-19 21:23, Stephan Mueller wrote:
> Am Dienstag, 19. August 2014, 10:17:36 schrieb Jussi Kivilinna:
>
> Hi Jussi,
>
>> Hello,
>>
>> On 2014-08-17 18:55, Stephan Mueller wrote:
>>> Hi,
>>>
>>> during playing around wi
Hello,
On 2014-08-19 21:23, Stephan Mueller wrote:
Am Dienstag, 19. August 2014, 10:17:36 schrieb Jussi Kivilinna:
Hi Jussi,
Hello,
On 2014-08-17 18:55, Stephan Mueller wrote:
Hi,
during playing around with the kernel crypto API, I implemented a
performance measurement tool kit
Hello,
On 2014-08-17 18:55, Stephan Mueller wrote:
> Hi,
>
> during playing around with the kernel crypto API, I implemented a performance
> measurement tool kit for the various kernel crypto API cipher types. The
> cryptoperf tool kit is provided in [1].
>
> Comments are welcome.
Your
Hello,
On 2014-08-17 18:55, Stephan Mueller wrote:
Hi,
during playing around with the kernel crypto API, I implemented a performance
measurement tool kit for the various kernel crypto API cipher types. The
cryptoperf tool kit is provided in [1].
Comments are welcome.
Your results are
On 02.10.2013 21:12, Rob Landley wrote:
> On 10/02/2013 11:10:37 AM, Kevin Mulvey wrote:
>> change kerneli to kernel as well as kerneli.org to kernel.org
>>
>> Signed-off-by: Kevin Mulvey
>
> There's a bug number for this?
>
> Acked, queued. (Although I'm not sure the value of pointing to
On 02.10.2013 21:12, Rob Landley wrote:
On 10/02/2013 11:10:37 AM, Kevin Mulvey wrote:
change kerneli to kernel as well as kerneli.org to kernel.org
Signed-off-by: Kevin Mulvey ke...@kevinmulvey.net
There's a bug number for this?
Acked, queued. (Although I'm not sure the value of
Hello,
Appears to be caused by some memory corruption. Changing from SLOB allocator to
SLUB made this crash disappear.
Some different crashes with same config:
[0.246152] cryptomgr_test (23) used greatest stack depth: 6400 bytes left
[0.246929] cryptomgr_test (24) used greatest stack
Hello,
Appears to be caused by some memory corruption. Changing from SLOB allocator to
SLUB made this crash disappear.
Some different crashes with same config:
[0.246152] cryptomgr_test (23) used greatest stack depth: 6400 bytes left
[0.246929] cryptomgr_test (24) used greatest stack
Tested-by: Dave Jones
Signed-off-by: Jussi Kivilinna
---
arch/x86/crypto/aesni-intel_asm.S | 48 +
1 file changed, 32 insertions(+), 16 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_asm.S
b/arch/x86/crypto/aesni-intel_asm.S
index 62fe22c..477e9d7
movdqa %xmm3,%xmm0
> 2b:*66 0f ef 02 pxor (%rdx),%xmm0 <-- trapping
> instruction
> 2f: f3 0f 7f 1e movdqu %xmm3,(%rsi)
> 33: 66 44 0f 70 db 13 pshufd $0x13,%xmm3,%xmm11
> 39: 66 0f d4 db paddq %xmm3,%
d4 db paddq %xmm3,%xmm3
3d: 66 data16
3e: 41 rex.B
3f:
crypto: aesni_intel - fix accessing of unaligned memory
From: Jussi Kivilinna jussi.kivili...@iki.fi
The new XTS code for aesni_intel uses input buffers directly as memory
...@redhat.com
Tested-by: Dave Jones da...@redhat.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
arch/x86/crypto/aesni-intel_asm.S | 48 +
1 file changed, 32 insertions(+), 16 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_asm.S
b/arch/x86
On 16.04.2013 19:20, Tim Chen wrote:
> These are simple tests to do sanity check of CRC T10 DIF hash. The
> correctness of the transform can be checked with the command
> modprobe tcrypt mode=47
> The speed of the transform can be evaluated with the command
> modprobe tcrypt mode=320
On 16.04.2013 19:20, Tim Chen wrote:
> This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
> instructions. Details discussing the implementation can be found in the
> paper:
>
> "Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction"
> URL:
On 16.04.2013 19:20, Tim Chen wrote:
This is the x86_64 CRC T10 DIF transform accelerated with the PCLMULQDQ
instructions. Details discussing the implementation can be found in the
paper:
Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction
URL:
On 16.04.2013 19:20, Tim Chen wrote:
These are simple tests to do sanity check of CRC T10 DIF hash. The
correctness of the transform can be checked with the command
modprobe tcrypt mode=47
The speed of the transform can be evaluated with the command
modprobe tcrypt mode=320
Signed-off-by: Jussi Kivilinna
---
crypto/tcrypt.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 24ea7df..66d254c 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1768,6 +1768,21 @@ static int do_test(int m
times faster than
the AVX implementation.
Signed-off-by: Jussi Kivilinna
---
arch/x86/crypto/Makefile |2
arch/x86/crypto/serpent-avx2-asm_64.S | 800 +
arch/x86/crypto/serpent_avx2_glue.c | 562
arch/x86/crypto
registers,
which should give additional speed up compared to the AVX implementation.
Signed-off-by: Jussi Kivilinna
---
arch/x86/crypto/Makefile |2
arch/x86/crypto/glue_helper-asm-avx2.S | 180 ++
arch/x86/crypto/twofish-avx2-asm_64.S | 600
Signed-off-by: Jussi Kivilinna
---
crypto/testmgr.h | 1100 --
1 file changed, 1062 insertions(+), 38 deletions(-)
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index d503660..dc2c054 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel
blocks for input (256 bytes). Table look-ups are performed using vpgatherdd
instruction directly from vector registers and thus should be faster than
earlier implementations.
Signed-off-by: Jussi Kivilinna
---
arch
test these on real hardware and
maybe give acked-by in case these look ok(?). If such is not possible, I'll
do the testing myself when those Haswell processors come available where I
live.
-Jussi
---
Jussi Kivilinna (6):
crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2
test these on real hardware and
maybe give acked-by in case these look ok(?). If such is not possible, I'll
do the testing myself when those Haswell processors come available where I
live.
-Jussi
---
Jussi Kivilinna (6):
crypto: testmgr - extend camellia test-vectors for camellia-aesni/avx2
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
crypto/testmgr.h | 1100 --
1 file changed, 1062 insertions(+), 38 deletions(-)
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index d503660..dc2c054 100644
--- a/crypto/testmgr.h
Patch adds AVX2/x86-64 implementation of Blowfish cipher, requiring 32 parallel
blocks for input (256 bytes). Table look-ups are performed using vpgatherdd
instruction directly from vector registers and thus should be faster than
earlier implementations.
Signed-off-by: Jussi Kivilinna
registers,
which should give additional speed up compared to the AVX implementation.
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
arch/x86/crypto/Makefile |2
arch/x86/crypto/glue_helper-asm-avx2.S | 180 ++
arch/x86/crypto/twofish-avx2-asm_64.S | 600
times faster than
the AVX implementation.
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
arch/x86/crypto/Makefile |2
arch/x86/crypto/serpent-avx2-asm_64.S | 800 +
arch/x86/crypto/serpent_avx2_glue.c | 562
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
crypto/tcrypt.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 24ea7df..66d254c 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1768,6 +1768,21 @@ static int do_test
On 22.03.2013 23:29, Tim Chen wrote:
> We added glue code and config options to create crypto
> module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines.
>
> Signed-off-by: Tim Chen
..snip..
> diff --git a/arch/x86/crypto/sha256_ssse3_glue.c
>
On 22.03.2013 23:29, Tim Chen wrote:
> Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions.
> Speedup of 40% or more has been measured over the generic implementation.
>
> Signed-off-by: Tim Chen
> ---
> arch/x86/crypto/sha256-ssse3-asm.S | 504
>
On 22.03.2013 23:29, Tim Chen wrote:
> We added glue code and config options to create crypto
> module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines.
>
> Signed-off-by: Tim Chen
> ---
> arch/x86/crypto/Makefile| 2 +
> arch/x86/crypto/sha512_ssse3_glue.c | 276
On 22.03.2013 23:29, Tim Chen wrote:
> We added glue code and config options to create crypto
> module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines.
>
> Signed-off-by: Tim Chen
> ---
I could not apply this patch cleanly on top of cryptodev-2.6 tree:
Applying: Create module
On 22.03.2013 23:29, Tim Chen wrote:
We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines.
Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
---
I could not apply this patch cleanly on top of cryptodev-2.6 tree:
On 22.03.2013 23:29, Tim Chen wrote:
We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA512 x86_64 assembly routines.
Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
---
arch/x86/crypto/Makefile| 2 +
On 22.03.2013 23:29, Tim Chen wrote:
Provides SHA256 x86_64 assembly routine optimized with SSSE3 instructions.
Speedup of 40% or more has been measured over the generic implementation.
Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
---
arch/x86/crypto/sha256-ssse3-asm.S | 504
On 22.03.2013 23:29, Tim Chen wrote:
We added glue code and config options to create crypto
module that uses SSE/AVX/AVX2 optimized SHA256 x86_64 assembly routines.
Signed-off-by: Tim Chen tim.c.c...@linux.intel.com
..snip..
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c
Quoting Steffen Klassert :
On Thu, Jan 24, 2013 at 01:25:46PM +0200, Jussi Kivilinna wrote:
Maybe it would be cleaner to not mess with pfkeyv2.h at all, but
instead mark algorithms that do not support pfkey with flag. See
patch below.
As nobody seems to have another opinion, we could
Quoting Steffen Klassert steffen.klass...@secunet.com:
On Thu, Jan 24, 2013 at 01:25:46PM +0200, Jussi Kivilinna wrote:
Maybe it would be cleaner to not mess with pfkeyv2.h at all, but
instead mark algorithms that do not support pfkey with flag. See
patch below.
As nobody seems
Quoting Steffen Klassert :
> On Wed, Jan 23, 2013 at 05:35:10PM +0200, Jussi Kivilinna wrote:
>>
>> Problem seems to be that PFKEYv2 does not quite work with IKEv2, and
>> XFRM API should be used instead. There is new numbers assigned for
>> IKEv2:
>> https
Quoting YOSHIFUJI Hideaki :
YOSHIFUJI Hideaki wrote:
Jussi Kivilinna wrote:
diff --git a/include/uapi/linux/pfkeyv2.h
b/include/uapi/linux/pfkeyv2.h
index 0b80c80..d61898e 100644
--- a/include/uapi/linux/pfkeyv2.h
+++ b/include/uapi/linux/pfkeyv2.h
@@ -296,6 +296,7 @@ struct sadb_x_kmaddress
Quoting Steffen Klassert steffen.klass...@secunet.com:
On Wed, Jan 23, 2013 at 05:35:10PM +0200, Jussi Kivilinna wrote:
Problem seems to be that PFKEYv2 does not quite work with IKEv2, and
XFRM API should be used instead. There is new numbers assigned for
IKEv2:
https://www.iana.org
Quoting YOSHIFUJI Hideaki yoshf...@linux-ipv6.org:
YOSHIFUJI Hideaki wrote:
Jussi Kivilinna wrote:
diff --git a/include/uapi/linux/pfkeyv2.h
b/include/uapi/linux/pfkeyv2.h
index 0b80c80..d61898e 100644
--- a/include/uapi/linux/pfkeyv2.h
+++ b/include/uapi/linux/pfkeyv2.h
@@ -296,6 +296,7
Quoting Tom St Denis :
- Original Message -
From: "Jussi Kivilinna"
To: "Tom St Denis"
Cc: linux-kernel@vger.kernel.org, "Herbert Xu"
, "David Miller" ,
linux-cry...@vger.kernel.org, "Steffen Klassert"
, net...@vger.kernel.org
Se
Quoting Tom St Denis :
Hey all,
Here's an updated patch which addresses a couple of build issues and
coding style complaints.
I still can't get it to run via testmgr I get
[ 162.407807] alg: No test for cmac(aes) (cmac(aes-generic))
Despite the fact I have an entry for cmac(aes) (much
Quoting Tom St Denis tstde...@elliptictech.com:
Hey all,
Here's an updated patch which addresses a couple of build issues and
coding style complaints.
I still can't get it to run via testmgr I get
[ 162.407807] alg: No test for cmac(aes) (cmac(aes-generic))
Despite the fact I have an
Quoting Tom St Denis tstde...@elliptictech.com:
- Original Message -
From: Jussi Kivilinna jussi.kivili...@mbnet.fi
To: Tom St Denis tstde...@elliptictech.com
Cc: linux-kernel@vger.kernel.org, Herbert Xu
herb...@gondor.apana.org.au, David Miller da...@davemloft.net,
linux-cry
Quoting Jussi Kivilinna :
Quoting Matt Sealey :
This question is to the implementor/committer (Dave McCullough), how
exactly did you measure the benchmark and can we reproduce it on some
other ARM box?
If it's long and laborious and not so important to test the IPsec
tunnel use-case, what
Quoting Matt Sealey :
This question is to the implementor/committer (Dave McCullough), how
exactly did you measure the benchmark and can we reproduce it on some
other ARM box?
If it's long and laborious and not so important to test the IPsec
tunnel use-case, what would be the simplest possible
Quoting Matt Sealey m...@genesi-usa.com:
This question is to the implementor/committer (Dave McCullough), how
exactly did you measure the benchmark and can we reproduce it on some
other ARM box?
If it's long and laborious and not so important to test the IPsec
tunnel use-case, what would be
Quoting Jussi Kivilinna jussi.kivili...@mbnet.fi:
Quoting Matt Sealey m...@genesi-usa.com:
This question is to the implementor/committer (Dave McCullough), how
exactly did you measure the benchmark and can we reproduce it on some
other ARM box?
If it's long and laborious and not so important
Quoting Behan Webster :
From: Jan-Simon Möller
The use of variable length arrays in structs (VLAIS) in the Linux Kernel code
precludes the use of compilers which don't implement VLAIS (for instance the
Clang compiler). This patch instead allocates the appropriate amount
of memory
using an
Quoting Behan Webster beh...@converseincode.com:
From: Jan-Simon Möller dl...@gmx.de
The use of variable length arrays in structs (VLAIS) in the Linux Kernel code
precludes the use of compilers which don't implement VLAIS (for instance the
Clang compiler). This patch instead allocates the
Quoting Herbert Xu :
On Sun, Sep 09, 2012 at 08:35:56AM -0700, Linus Torvalds wrote:
On Sun, Sep 9, 2012 at 5:54 AM, Jussi Kivilinna
wrote:
>
> Does reverting e46e9a46386bca8e80a6467b5c643dc494861896 help?
>
> That commit added crypto selftest for authenc(hmac(sha1),cbc(
Quoting Herbert Xu :
Can you try blacklisting/not loading sha1_ssse3 and aesni_intel
to see which one of them is causing this crash? Of course if you
can still reproduce this without loading either of them that would
also be interesting to know.
This triggers with aes-x86_64 and sha1_generic
Quoting Romain Francoise :
Still seeing this BUG with -rc5, that I originally reported here:
http://marc.info/?l=linux-crypto-vger=134653220530264=2
Does reverting e46e9a46386bca8e80a6467b5c643dc494861896 help?
That commit added crypto selftest for authenc(hmac(sha1),cbc(aes)) in
3.6, and
Quoting Romain Francoise rom...@orebokech.com:
Still seeing this BUG with -rc5, that I originally reported here:
http://marc.info/?l=linux-crypto-vgerm=134653220530264w=2
Does reverting e46e9a46386bca8e80a6467b5c643dc494861896 help?
That commit added crypto selftest for
Quoting Herbert Xu herb...@gondor.apana.org.au:
Can you try blacklisting/not loading sha1_ssse3 and aesni_intel
to see which one of them is causing this crash? Of course if you
can still reproduce this without loading either of them that would
also be interesting to know.
This triggers with
Quoting Herbert Xu herb...@gondor.apana.org.au:
On Sun, Sep 09, 2012 at 08:35:56AM -0700, Linus Torvalds wrote:
On Sun, Sep 9, 2012 at 5:54 AM, Jussi Kivilinna
jussi.kivili...@mbnet.fi wrote:
Does reverting e46e9a46386bca8e80a6467b5c643dc494861896 help?
That commit added crypto selftest
Quoting Borislav Petkov :
On Wed, Aug 22, 2012 at 10:20:03PM +0300, Jussi Kivilinna wrote:
Actually it does look better, at least for encryption. Decryption
had different
ordering for test, which appears to be bad on bulldozer as it is on
sandy-bridge.
So, yet another patch then :)
Here
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 22, 2012 at 10:20:03PM +0300, Jussi Kivilinna wrote:
Actually it does look better, at least for encryption. Decryption
had different
ordering for test, which appears to be bad on bulldozer as it is on
sandy-bridge.
So, yet another patch
Quoting Jason Garrett-Glaser :
On Wed, Aug 22, 2012 at 12:20 PM, Jussi Kivilinna
wrote:
Quoting Borislav Petkov :
On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
Looks that encryption lost ~0.4% while decryption gained ~1.8%.
For 256 byte test, it's still slightly slower
Quoting Jason Garrett-Glaser ja...@x264.com:
On Wed, Aug 22, 2012 at 12:20 PM, Jussi Kivilinna
jussi.kivili...@mbnet.fi wrote:
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
Looks that encryption lost ~0.4% while decryption gained
Quoting Borislav Petkov :
> On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
>> Looks that encryption lost ~0.4% while decryption gained ~1.8%.
>>
>> For 256 byte test, it's still slightly slower than twofish-3way
>> (~3%). For 1k
>> and 8k tes
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
Looks that encryption lost ~0.4% while decryption gained ~1.8%.
For 256 byte test, it's still slightly slower than twofish-3way
(~3%). For 1k
and 8k tests, it's ~5% faster.
Here's very
86/crypto/twofish-avx-x86_64-asm_64.S
b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
index 35f4557..693963a 100644
--- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
@@ -4,6 +4,8 @@
* Copyright (C) 2012 Johannes Goetzfried
*
*
+ * Copyright ©
...@informatik.stud.uni-erlangen.de
*
+ * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License
Quoting David Daney :
On 08/16/2012 02:20 PM, Kasatkin, Dmitry wrote:
Hello,
Some places in the code uses variable-size allocation on stack..
For example from hmac_setkey():
struct {
struct shash_desc shash;
char ctx[crypto_shash_descsize(hash)];
/twofish-avx-x86_64-asm_64.S
index 35f4557..6638a87 100644
--- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
@@ -4,6 +4,8 @@
* Copyright (C) 2012 Johannes Goetzfried
*
*
+ * Copyright © 2012 Jussi Kivilinna
+ *
* This program is free s
Jussi Kivilinna jussi.kivili...@mbnet.fi
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
@@ -47,16 +49,21 @@
#define RC2 %xmm6
Quoting David Daney ddaney.c...@gmail.com:
On 08/16/2012 02:20 PM, Kasatkin, Dmitry wrote:
Hello,
Some places in the code uses variable-size allocation on stack..
For example from hmac_setkey():
struct {
struct shash_desc shash;
char
Quoting Borislav Petkov :
On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
About ~5% slower, probably because I was tuning for sandy-bridge and
introduced more FPU<=>CPU register moves.
Here's new version of patch, with FPU<=>CPU moves from original
implementa
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 15, 2012 at 08:34:25PM +0300, Jussi Kivilinna wrote:
About ~5% slower, probably because I was tuning for sandy-bridge and
introduced more FPU=CPU register moves.
Here's new version of patch, with FPU=CPU moves from original
implementation
Quoting Borislav Petkov :
> On Wed, Aug 15, 2012 at 05:22:03PM +0300, Jussi Kivilinna wrote:
>
>> Patch replaces 'movb' instructions with 'movzbl' to break false
>> register dependencies and interleaves instructions better for
>> out-of-order scheduling.
>>
> On Wed, Aug 15, 2012 at 04:48:54PM +0300, Jussi Kivilinna wrote:
> > I posted patch that optimize twofish-avx few weeks ago:
> > http://marc.info/?l=linux-crypto-vger=134364845024825=2
> >
> > I'd be interested to know, if this is patch helps on Bulldozer.
>
>
Quoting Borislav Petkov :
Ok, here we go. Raw data below.
Thanks alot!
Twofish-avx appears somewhat slower than 3way, ~9% slower with 256byte
blocks to ~3% slower with 8kb blocks.
Let me know if you need more tests.
I posted patch that optimize twofish-avx few weeks ago:
Quoting Borislav Petkov :
On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
I started thinking about the performance on AMD Bulldozer.
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on
Intel
Quoting Johannes Goetzfried
:
This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in general-purpose
registers.
For small blocksizes the
Quoting Johannes Goetzfried
johannes.goetzfr...@informatik.stud.uni-erlangen.de:
This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
I started thinking about the performance on AMD Bulldozer.
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
on AMD CPU is alot slower (latencies from 8 to 12 cycles
Quoting Borislav Petkov b...@alien8.de:
Ok, here we go. Raw data below.
Thanks alot!
Twofish-avx appears somewhat slower than 3way, ~9% slower with 256byte
blocks to ~3% slower with 8kb blocks.
snip
Let me know if you need more tests.
I posted patch that optimize twofish-avx few
On Wed, Aug 15, 2012 at 04:48:54PM +0300, Jussi Kivilinna wrote:
I posted patch that optimize twofish-avx few weeks ago:
http://marc.info/?l=linux-crypto-vgerm=134364845024825w=2
I'd be interested to know, if this is patch helps on Bulldozer.
Sure, can you inline it here too please
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 15, 2012 at 05:22:03PM +0300, Jussi Kivilinna wrote:
Patch replaces 'movb' instructions with 'movzbl' to break false
register dependencies and interleaves instructions better for
out-of-order scheduling.
Also move common round code
Quoting Alexey Khoroshilov :
Do not leak memory by updating pointer with potentially NULL realloc
return value.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov
Thanks!
Acked-by: Jussi Kivilinna
---
drivers/net/wireless/rndis_wlan.c
Quoting Alexey Khoroshilov khoroshi...@ispras.ru:
Do not leak memory by updating pointer with potentially NULL realloc
return value.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru
Thanks!
Acked-by: Jussi Kivilinna
UP(NSEC_PER_SEC, dev->pk_clk_freq)", which
should be changed to DIV_ROUND_UP_ULL now that NSEC_PER_SEC is 64bit
on 32bit archs. Patch to fix hifn_795x is attached (only compile
tested).
-Jussi
crypto: hifn_795x - fix 64bit division and undefined __divdi3 on 32bit archs
From: J
DIV_ROUND_UP(NSEC_PER_SEC, dev-pk_clk_freq), which
should be changed to DIV_ROUND_UP_ULL now that NSEC_PER_SEC is 64bit
on 32bit archs. Patch to fix hifn_795x is attached (only compile
tested).
-Jussi
crypto: hifn_795x - fix 64bit division and undefined __divdi3 on 32bit archs
From: Jussi Kivilinna
95 matches
Mail list logo