For *now* I'm committing only this change to CVS and will have closer
look at unrolled loop later on...
To denote its versatility our RC4 assembler module was renamed from
rc4-amd64.pl to rc4-x86_64.pl. New RC4_CHAR code-path performs almost
two times (+95%) better on EM64T than prior-April
On Wed, 2005-04-06 at 08:08, Zou Nan hai wrote:
On Tue, 2005-04-05 at 18:17, Andy Polyakov wrote:
Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t,
b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive
536Mb/s in RC4Speed.
Would you please
BTW, 272MBps at 3.6GHz? I get 262MBps out of [as just mentioned
virtually identical] 32-bit code at 2.4GHz P4...
In fact, Your implement on EM64t isn't that slow if
we change the inc and dec to add and sub. :)
With that change the throughput boost from 272Mb/s to 396Mb/s.
Huh? And
Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb
(%rdi,%r10),%r14b and make inter-register move between r8 and r14
conditional?
I will try it.
I have tried it, not performance gain.
Does it mean that it's same or does it mean that it's slower? Was it
cmov or was it jump over mov
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
On Behalf Of Andy Polyakov
Sent: Wednesday, April 06, 2005 5:34 PM
To: openssl-dev@openssl.org
Subject: Re: RC4 optimize for em64t
Or how about moving mozb (%rdi,%r10),%r8d upwards as movzb
(%rdi,%r10),%r14b
Zou, Nanhai wrote:
| rc4-x86_84.optimized.s Hi,
| Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona
| (Em64t, b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can
| archive 536Mb/s in RC4Speed.
|
| Would you please review it?
Your RC4 implementation seems to contain
in OpenSSL but also been
validated on Apache 2.0 + OpenSSL with stress test.
Thanks
Zou Nan hai
-Original Message-
From: Marc Bevand [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 05, 2005 2:22 PM
To: Zou, Nanhai
Cc: openssl-dev@openssl.org; Andy Polyakov
Subject: Re: RC4 optimize
Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t,
b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive 536Mb/s in
RC4Speed.
Would you please review it?
Cool conditional moves in unrolled loop. Have you considered/tried cmov
instead of jump over move
On Tue, 2005-04-05 at 18:17, Andy Polyakov wrote:
Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t,
b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive
536Mb/s in RC4Speed.
Would you please review it?
Cool conditional moves in unrolled loop.
rc4-x86_84.optimized.s Hi,
Current OpenSSL (0.9.8-dev) rc4speed throughput on a Nocona (Em64t,
b4bit) 3.6GHz is 272Mb/s, while this version of RC4 code can archive 536Mb/s in
RC4Speed.
Would you please review it?
Thanks
Zou Nan hai
rc4-x86_84.optimized.s
Description:
10 matches
Mail list logo