Re: RC4 optimize for em64t

2005-05-03 Thread Andy Polyakov
For *now* I'm committing only this change to CVS and will have closer look at unrolled loop later on... To denote its versatility our RC4 assembler module was renamed from rc4-amd64.pl to rc4-x86_64.pl. New RC4_CHAR code-path performs almost two times (+95%) better on EM64T than prior-April

Re: RC4 optimize for em64t

2005-04-06 Thread Zou Nan hai
On Wed, 2005-04-06 at 08:08, Zou Nan hai wrote: On Tue, 2005-04-05 at 18:17, Andy Polyakov wrote: Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t, b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive 536Mb/s in RC4Speed. Would you please

Re: RC4 optimize for em64t

2005-04-06 Thread Andy Polyakov
BTW, 272MBps at 3.6GHz? I get 262MBps out of [as just mentioned virtually identical] 32-bit code at 2.4GHz P4... In fact, Your implement on EM64t isn't that slow if we change the inc and dec to add and sub. :) With that change the throughput boost from 272Mb/s to 396Mb/s. Huh? And

Re: RC4 optimize for em64t

2005-04-06 Thread Andy Polyakov
Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb (%rdi,%r10),%r14b and make inter-register move between r8 and r14 conditional? I will try it. I have tried it, not performance gain. Does it mean that it's same or does it mean that it's slower? Was it cmov or was it jump over mov

RE: RC4 optimize for em64t

2005-04-06 Thread Zou, Nanhai
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andy Polyakov Sent: Wednesday, April 06, 2005 5:34 PM To: openssl-dev@openssl.org Subject: Re: RC4 optimize for em64t Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb (%rdi,%r10),%r14b

Re: RC4 optimize for em64t

2005-04-05 Thread Marc Bevand
Zou, Nanhai wrote: | rc4-x86_84.optimized.s Hi, | Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona | (Em64t, b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can | archive 536Mb/s in RC4Speed. | | Would you please review it? Your RC4 implementation seems to contain

RE: RC4 optimize for em64t

2005-04-05 Thread Zou, Nanhai
in OpenSSL but also been validated on Apache 2.0 + OpenSSL with stress test. Thanks Zou Nan hai -Original Message- From: Marc Bevand [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 05, 2005 2:22 PM To: Zou, Nanhai Cc: openssl-dev@openssl.org; Andy Polyakov Subject: Re: RC4 optimize

Re: RC4 optimize for em64t

2005-04-05 Thread Andy Polyakov
Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t, b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive 536Mb/s in RC4Speed. Would you please review it? Cool conditional moves in unrolled loop. Have you considered/tried cmov instead of jump over move

Re: RC4 optimize for em64t

2005-04-05 Thread Zou Nan hai
On Tue, 2005-04-05 at 18:17, Andy Polyakov wrote: Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t, b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive 536Mb/s in RC4Speed. Would you please review it? Cool conditional moves in unrolled loop.

RC4 optimize for em64t

2005-04-04 Thread Zou, Nanhai
rc4-x86_84.optimized.s Hi, Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t, b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive 536Mb/s in RC4Speed. Would you please review it? Thanks Zou Nan hai rc4-x86_84.optimized.s Description: