there is a cpuid test in rc4_skey.c which tests the hyperthreading cpuid 
bit to distinguish between two implementations of rc4... unfortunately 
this fails to properly distinguish the cpus.  all dual core cpus (intel or 
amd) report HT support even if they don't use symmetric-multithreading 
like some p4 do.

on a dual-core k8 revF i see the following performance from a 0.9.8d build 
without any changes:

% ./openssl-0.9.8d speed rc4
Doing rc4 for 3s on 16 size blocks: 51091562 rc4's in 3.01s
Doing rc4 for 3s on 64 size blocks: 15937508 rc4's in 3.00s
Doing rc4 for 3s on 256 size blocks: 4190704 rc4's in 3.00s
Doing rc4 for 3s on 1024 size blocks: 1062795 rc4's in 3.00s
Doing rc4 for 3s on 8192 size blocks: 133319 rc4's in 3.01s
OpenSSL 0.9.8d 28 Sep 2006
built on: Tue Dec 26 17:40:14 PST 2006
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) 
idea(int) blowfish(ptr2)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -static 
-m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int -DMD5_ASM
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             271583.05k   340000.17k   357606.74k   362767.36k   362840.28k

if i disable the cpuid test in rc4_skey.c i get these much improved
numbers:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             408832.88k   463675.26k   474736.30k   481802.21k   484870.83k

i see the same difference on dual-core k8 revE as well.


it seems somewhat fortunate that core2 CPUs track the p4 behaviour
w.r.t. these two rc4 implementations.  here are the core2 results with the
stock code / HT test:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             166799.58k   180552.87k   182437.93k   183381.67k   183206.87k

and with cpuid test disabled:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             123361.30k   128102.17k   129876.57k   128787.22k   129419.95k


i understand from the comments in rc4_skey.c that you're attempting to
distinguish between {p3, k8} and {p4}... with this updated information
it seems you want to distinguish {p3, k8} and {p4, core2}.  to do this
i'd suggest decoding the cpuid vendor, family and model values... but
this becomes unmaintainable really quickly:

        if (vendor == intel && (family == 15 || (family == 6 && model >= 15))) {
                // intel p4 and core2 only (and likely follow-ons to core2)
                // XXX: need to test if core (model 14) should be here
        }
        else {
                // everyone else
        }

it seems a more sustainable solution would be some sort of
/etc/openssl.conf and an "openssl speed --generate-conf" option used at
package install time to test several implementations.

for the record, core2 64-bit code seriously underperforming the 32-bit
code...  here's the 32-bit results (with cpuid test enabled):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
rc4             254164.64k   279901.10k   279364.38k   283617.62k   276690.26k

sorry, i haven't developed patches to fix this... i just wanted to record
these results somewhere for now... i'm not even sure which approach is
the best to fix this.

-dean

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Reply via email to