there is a cpuid test in rc4_skey.c which tests the hyperthreading cpuid bit to distinguish between two implementations of rc4... unfortunately this fails to properly distinguish the cpus. all dual core cpus (intel or amd) report HT support even if they don't use symmetric-multithreading like some p4 do.
on a dual-core k8 revF i see the following performance from a 0.9.8d build without any changes: % ./openssl-0.9.8d speed rc4 Doing rc4 for 3s on 16 size blocks: 51091562 rc4's in 3.01s Doing rc4 for 3s on 64 size blocks: 15937508 rc4's in 3.00s Doing rc4 for 3s on 256 size blocks: 4190704 rc4's in 3.00s Doing rc4 for 3s on 1024 size blocks: 1062795 rc4's in 3.00s Doing rc4 for 3s on 8192 size blocks: 133319 rc4's in 3.01s OpenSSL 0.9.8d 28 Sep 2006 built on: Tue Dec 26 17:40:14 PST 2006 options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(ptr2) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -static -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int -DMD5_ASM available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 271583.05k 340000.17k 357606.74k 362767.36k 362840.28k if i disable the cpuid test in rc4_skey.c i get these much improved numbers: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 408832.88k 463675.26k 474736.30k 481802.21k 484870.83k i see the same difference on dual-core k8 revE as well. it seems somewhat fortunate that core2 CPUs track the p4 behaviour w.r.t. these two rc4 implementations. here are the core2 results with the stock code / HT test: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 166799.58k 180552.87k 182437.93k 183381.67k 183206.87k and with cpuid test disabled: type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 123361.30k 128102.17k 129876.57k 128787.22k 129419.95k i understand from the comments in rc4_skey.c that you're attempting to distinguish between {p3, k8} and {p4}... with this updated information it seems you want to distinguish {p3, k8} and {p4, core2}. to do this i'd suggest decoding the cpuid vendor, family and model values... but this becomes unmaintainable really quickly: if (vendor == intel && (family == 15 || (family == 6 && model >= 15))) { // intel p4 and core2 only (and likely follow-ons to core2) // XXX: need to test if core (model 14) should be here } else { // everyone else } it seems a more sustainable solution would be some sort of /etc/openssl.conf and an "openssl speed --generate-conf" option used at package install time to test several implementations. for the record, core2 64-bit code seriously underperforming the 32-bit code... here's the 32-bit results (with cpuid test enabled): type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes rc4 254164.64k 279901.10k 279364.38k 283617.62k 276690.26k sorry, i haven't developed patches to fix this... i just wanted to record these results somewhere for now... i'm not even sure which approach is the best to fix this. -dean ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]