Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-13 Thread linux
>> The names are the order they were written in. "One" is the lib/sha1.c >> code (547 bytes with -Os). "Four" is a 5x unrolled C version (1106 bytes). > > I'd like to see your version four. Here's the test driver wrapped around the earlier assembly code. It's an ugly mess of copy & paste code,

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-13 Thread linux
The names are the order they were written in. One is the lib/sha1.c code (547 bytes with -Os). Four is a 5x unrolled C version (1106 bytes). I'd like to see your version four. Here's the test driver wrapped around the earlier assembly code. It's an ugly mess of copy paste code, of course.

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-12 Thread Matt Mackall
On Tue, Jun 12, 2007 at 01:05:44AM -0400, [EMAIL PROTECTED] wrote: > > I got this code from Nettle, originally, and I never looked at the SHA-1 > > round structure very closely. I'll give that approach a try. > > Attached is some (tested, working, and public domain) assembly code for > three

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-12 Thread Matt Mackall
On Tue, Jun 12, 2007 at 01:05:44AM -0400, [EMAIL PROTECTED] wrote: I got this code from Nettle, originally, and I never looked at the SHA-1 round structure very closely. I'll give that approach a try. Attached is some (tested, working, and public domain) assembly code for three different

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread linux
> I got this code from Nettle, originally, and I never looked at the SHA-1 > round structure very closely. I'll give that approach a try. Attached is some (tested, working, and public domain) assembly code for three different sha_transform implementations. Compared to C code, the timings to

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
Benjamin Gilbert wrote: Jan Engelhardt wrote: UTF-8 please. Hint: it should most likely be an ö. Whoops, I had thought I had gotten that right. I'll get updates for parts 2 and 3 sent out on Monday. I'm sending the corrected parts 2 and 3 as replies to this email. The UTF-8 fix is the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
[EMAIL PROTECTED] wrote: /* Majority: (x^y)|(y)|(z) = (x & z) + ((x ^ z) & y) #define F3(x,y,z,dest) \ movlz, TMP; \ andlx, TMP; \ addlTMP, dest; \ movlz, TMP; \ xorlx, TMP; \ andl

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
Matt Mackall wrote: In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s. Were your tests with or without the latest /dev/urandom fixes? This one in particular:

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Andi Kleen
Matt Mackall <[EMAIL PROTECTED]> writes: > > Have you benchmarked this against lib/sha1.c? Please post the results. > Until then, I'm frankly skeptical that your unrolled version is faster > because when I introduced lib/sha1.c the rolled version therein won by > a significant margin and had

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread linux
+#define F3(x,y,z) \ + movlx, TMP2;\ + andly, TMP2;\ + movlx, TMP; \ + orl y, TMP;

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread linux
+#define F3(x,y,z) \ + movlx, TMP2;\ + andly, TMP2;\ + movlx, TMP; \ + orl y, TMP;

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Andi Kleen
Matt Mackall [EMAIL PROTECTED] writes: Have you benchmarked this against lib/sha1.c? Please post the results. Until then, I'm frankly skeptical that your unrolled version is faster because when I introduced lib/sha1.c the rolled version therein won by a significant margin and had 1/10th the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
Matt Mackall wrote: In 2003, I was getting 17MB/s out of my Athlon. Now I'm getting 2.7MB/s. Were your tests with or without the latest /dev/urandom fixes? This one in particular:

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
[EMAIL PROTECTED] wrote: /* Majority: (x^y)|(yz)|(zx) = (x z) + ((x ^ z) y) #define F3(x,y,z,dest) \ movlz, TMP; \ andlx, TMP; \ addlTMP, dest; \ movlz, TMP; \ xorlx, TMP; \ andl

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread Benjamin Gilbert
Benjamin Gilbert wrote: Jan Engelhardt wrote: UTF-8 please. Hint: it should most likely be an ö. Whoops, I had thought I had gotten that right. I'll get updates for parts 2 and 3 sent out on Monday. I'm sending the corrected parts 2 and 3 as replies to this email. The UTF-8 fix is the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-11 Thread linux
I got this code from Nettle, originally, and I never looked at the SHA-1 round structure very closely. I'll give that approach a try. Attached is some (tested, working, and public domain) assembly code for three different sha_transform implementations. Compared to C code, the timings to hash

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Matt Mackall
On Sun, Jun 10, 2007 at 12:47:19PM -0400, Benjamin Gilbert wrote: > Matt Mackall wrote: > >On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote: > >>It's not just the loop unrolling; it's the register allocation and > >>spilling. For comparison, I built SHATransform() from the >

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Benjamin Gilbert
Matt Mackall wrote: On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote: It's not just the loop unrolling; it's the register allocation and spilling. For comparison, I built SHATransform() from the drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and SHA_CODE_SIZE == 3

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Matt Mackall
On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote: > Jeff Garzik wrote: > >Matt Mackall wrote: > >>Have you benchmarked this against lib/sha1.c? Please post the results. > >>Until then, I'm frankly skeptical that your unrolled version is faster > >>because when I introduced

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Matt Mackall
On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote: Jeff Garzik wrote: Matt Mackall wrote: Have you benchmarked this against lib/sha1.c? Please post the results. Until then, I'm frankly skeptical that your unrolled version is faster because when I introduced lib/sha1.c the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Benjamin Gilbert
Matt Mackall wrote: On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote: It's not just the loop unrolling; it's the register allocation and spilling. For comparison, I built SHATransform() from the drivers/char/random.c in 2.6.11, using gcc 3.3.5 with -O2 and SHA_CODE_SIZE == 3

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-10 Thread Matt Mackall
On Sun, Jun 10, 2007 at 12:47:19PM -0400, Benjamin Gilbert wrote: Matt Mackall wrote: On Sat, Jun 09, 2007 at 08:33:25PM -0400, Benjamin Gilbert wrote: It's not just the loop unrolling; it's the register allocation and spilling. For comparison, I built SHATransform() from the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert
Jan Engelhardt wrote: On Jun 8 2007 17:42, Benjamin Gilbert wrote: @@ -0,0 +1,299 @@ +/* + * x86-optimized SHA1 hash algorithm (i486 and above) + * + * Originally from Nettle + * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]> + * + * Copyright (C) 2004, Niels M?ller + * Copyright

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert
Jeff Garzik wrote: Matt Mackall wrote: Have you benchmarked this against lib/sha1.c? Please post the results. Until then, I'm frankly skeptical that your unrolled version is faster because when I introduced lib/sha1.c the rolled version therein won by a significant margin and had 1/10th the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Matt Mackall
On Sat, Jun 09, 2007 at 04:23:27PM -0400, Jeff Garzik wrote: > Matt Mackall wrote: > >On Fri, Jun 08, 2007 at 05:42:53PM -0400, Benjamin Gilbert wrote: > >>Add x86-optimized implementation of the SHA-1 hash function, taken from > >>Nettle under the LGPL. This code will be enabled on kernels

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Jeff Garzik
Matt Mackall wrote: On Fri, Jun 08, 2007 at 05:42:53PM -0400, Benjamin Gilbert wrote: Add x86-optimized implementation of the SHA-1 hash function, taken from Nettle under the LGPL. This code will be enabled on kernels compiled for 486es or better; kernels which support 386es will use the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Matt Mackall
On Fri, Jun 08, 2007 at 05:42:53PM -0400, Benjamin Gilbert wrote: > Add x86-optimized implementation of the SHA-1 hash function, taken from > Nettle under the LGPL. This code will be enabled on kernels compiled for > 486es or better; kernels which support 386es will use the generic >

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Jan Engelhardt
On Jun 8 2007 17:42, Benjamin Gilbert wrote: >@@ -0,0 +1,299 @@ >+/* >+ * x86-optimized SHA1 hash algorithm (i486 and above) >+ * >+ * Originally from Nettle >+ * Ported from M4 to cpp by Benjamin Gilbert <[EMAIL PROTECTED]> >+ * >+ * Copyright (C) 2004, Niels M?ller >+ * Copyright (C) 2006-2007

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Jan Engelhardt
On Jun 8 2007 17:42, Benjamin Gilbert wrote: @@ -0,0 +1,299 @@ +/* + * x86-optimized SHA1 hash algorithm (i486 and above) + * + * Originally from Nettle + * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED] + * + * Copyright (C) 2004, Niels M?ller + * Copyright (C) 2006-2007 Carnegie

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Matt Mackall
On Fri, Jun 08, 2007 at 05:42:53PM -0400, Benjamin Gilbert wrote: Add x86-optimized implementation of the SHA-1 hash function, taken from Nettle under the LGPL. This code will be enabled on kernels compiled for 486es or better; kernels which support 386es will use the generic implementation

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Jeff Garzik
Matt Mackall wrote: On Fri, Jun 08, 2007 at 05:42:53PM -0400, Benjamin Gilbert wrote: Add x86-optimized implementation of the SHA-1 hash function, taken from Nettle under the LGPL. This code will be enabled on kernels compiled for 486es or better; kernels which support 386es will use the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Matt Mackall
On Sat, Jun 09, 2007 at 04:23:27PM -0400, Jeff Garzik wrote: Matt Mackall wrote: On Fri, Jun 08, 2007 at 05:42:53PM -0400, Benjamin Gilbert wrote: Add x86-optimized implementation of the SHA-1 hash function, taken from Nettle under the LGPL. This code will be enabled on kernels compiled for

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert
Jeff Garzik wrote: Matt Mackall wrote: Have you benchmarked this against lib/sha1.c? Please post the results. Until then, I'm frankly skeptical that your unrolled version is faster because when I introduced lib/sha1.c the rolled version therein won by a significant margin and had 1/10th the

Re: [PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-09 Thread Benjamin Gilbert
Jan Engelhardt wrote: On Jun 8 2007 17:42, Benjamin Gilbert wrote: @@ -0,0 +1,299 @@ +/* + * x86-optimized SHA1 hash algorithm (i486 and above) + * + * Originally from Nettle + * Ported from M4 to cpp by Benjamin Gilbert [EMAIL PROTECTED] + * + * Copyright (C) 2004, Niels M?ller + * Copyright

[PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-08 Thread Benjamin Gilbert
Add x86-optimized implementation of the SHA-1 hash function, taken from Nettle under the LGPL. This code will be enabled on kernels compiled for 486es or better; kernels which support 386es will use the generic implementation (since we need BSWAP). We disable building lib/sha1.o when an

[PATCH 2/3] [CRYPTO] Add optimized SHA-1 implementation for i486+

2007-06-08 Thread Benjamin Gilbert
Add x86-optimized implementation of the SHA-1 hash function, taken from Nettle under the LGPL. This code will be enabled on kernels compiled for 486es or better; kernels which support 386es will use the generic implementation (since we need BSWAP). We disable building lib/sha1.o when an