Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
On Thu, Aug 11, 2011 at 4:50 PM, Andy Lutomirski wrote: > I have vague plans to clean up extended state handling and make > kernel_fpu_begin work efficiently from any context.  (i.e. the first > kernel_fpu_begin after a context switch could take up to ~60 ns on Sandy > Bridge, but further calls to

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-14 Thread Mathias Krause
Hi Max, 2011/8/8 Locktyukhin, Maxim : > I'd like to note that at Intel we very much appreciate Mathias effort to > port/integrate this implementation into Linux kernel! > > > $0.02 re tcrypt perf numbers below: I believe something must be terribly > broken with the tcrypt measurements > > 20 (an

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Andrew Lutomirski
On Thu, Aug 11, 2011 at 11:08 AM, Herbert Xu wrote: > On Thu, Aug 11, 2011 at 10:50:49AM -0400, Andy Lutomirski wrote: >> >>> This is pretty similar to the situation with the Intel AES code. >>> Over there they solved it by using the asynchronous interface and >>> deferring the processing to a wor

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Herbert Xu
On Thu, Aug 11, 2011 at 10:50:49AM -0400, Andy Lutomirski wrote: > >> This is pretty similar to the situation with the Intel AES code. >> Over there they solved it by using the asynchronous interface and >> deferring the processing to a work queue. > > I have vague plans to clean up extended state

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-11 Thread Andy Lutomirski
On 08/04/2011 02:44 AM, Herbert Xu wrote: On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: With this algorithm I was able to increase the throughput of a single IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using the SSSE3 variant -- a speedup of +34.8%. Were yo

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-07 Thread Sandy Harris
On Mon, Aug 8, 2011 at 1:48 PM, Locktyukhin, Maxim wrote: > 20 (and more) cycles per byte shown below are not reasonable numbers for SHA-1 > - ~6 c/b (as can be seen in some of the results for Core2) is the expected > results ... Ten years ago, on Pentium II, one benchmark showed 13 cycles/byte

RE: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-07 Thread Locktyukhin, Maxim
v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64 On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu wrote: > On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: >> >> With this algorithm I was able to increase the throughput of a single >> IPsec link from

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 7:05 PM, Mathias Krause wrote: > It does. Just have a look at how fpu_available() is implemented: read: irq_fpu_usable() -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-04 Thread Mathias Krause
On Thu, Aug 4, 2011 at 8:44 AM, Herbert Xu wrote: > On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: >> >> With this algorithm I was able to increase the throughput of a single >> IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using >> the SSSE3 variant -- a speedup o

Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-08-03 Thread Herbert Xu
On Sun, Jul 24, 2011 at 07:53:14PM +0200, Mathias Krause wrote: > > With this algorithm I was able to increase the throughput of a single > IPsec link from 344 Mbit/s to 464 Mbit/s on a Core 2 Quad CPU using > the SSSE3 variant -- a speedup of +34.8%. Were you testing this on the transmit side or

[PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

2011-07-24 Thread Mathias Krause
This is an assembler implementation of the SHA1 algorithm using the Supplemental SSE3 (SSSE3) instructions or, when available, the Advanced Vector Extensions (AVX). Testing with the tcrypt module shows the raw hash performance is up to 2.3 times faster than the C implementation, using 8k data bloc