After updating my -current kernel from 6.99.24 to 6.99.27 so I could commit my ubsec(4) changes I noticed that under 6.99.27 I get between 3 and 8 percent less throughput on accelerated crypto ops.
Note that I am using the exact same ubsec(4) code[1] with both kernels, so I think it is unlikely a problem with ubsec(4). I did not change userland. The old kernel is 6.99.24 from Oct, 27th and the 6.99.27 is from Nov, 17th. The machine is an ML110 G6 w/32G RAM and Intel Xeon X3430 4 core @2.4 GHz running amd64. There's nothing obvious to me in the dmesg diff that would give me a clue: --- dmesg.6.99.24 2013-11-19 15:44:24.000000000 +0100 +++ dmesg.6.99.27 2013-11-19 15:39:29.000000000 +0100 @@ -4,7 +4,7 @@ Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. -NetBSD 6.99.24 (GENERIC) #4: Mon Oct 28 18:58:32 CET 2013 +NetBSD 6.99.27 (GENERIC) #9: Sun Nov 17 17:47:24 CET 2013 bad@flexible-demeanour:/home/bad/work/nb/src/sys/arch/amd64/compile/GENERIC total memory = 32759 MB avail memory = 31792 MB @@ -133,11 +133,14 @@ acpicpu0: T5: FFH, lat 1 us, pow 285 mW, 38 % acpicpu0: T6: FFH, lat 1 us, pow 190 mW, 25 % acpicpu0: T7: FFH, lat 1 us, pow 95 mW, 13 % +coretemp0 at cpu0: thermal sensor, 1 C resolution acpicpu1 at cpu1: ACPI CPU +coretemp1 at cpu1: thermal sensor, 1 C resolution acpicpu2 at cpu2: ACPI CPU +coretemp2 at cpu2: thermal sensor, 1 C resolution acpicpu3 at cpu3: ACPI CPU +coretemp3 at cpu3: thermal sensor, 1 C resolution timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0 -timecounter: Timecounter "TSC" frequency 2394079440 Hz quality 3000 uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 @@ -175,7 +178,9 @@ audio0 at pad0: half duplex, playback, capture boot device: wd0 root on wd0a dumps on wd0b +/: replaying log to memory root file system type: ffs +/: replaying log to disk ipmi0: version 2.0 interface KCS iobase 0xca2/2 spacing 1 wsdisplay0: screen 1 added (80x25, vt100 emulation) wsdisplay0: screen 2 added (80x25, vt100 emulation) Results from "openssl speed -evp des-ede3-cbc -elapsed": NetBSD 6.99.24/bcm5862 Doing des-ede3-cbc for 3s on 16 size blocks: 115774 des-ede3-cbc's in 3.16s Doing des-ede3-cbc for 3s on 64 size blocks: 116420 des-ede3-cbc's in 3.16s Doing des-ede3-cbc for 3s on 256 size blocks: 99863 des-ede3-cbc's in 3.10s Doing des-ede3-cbc for 3s on 1024 size blocks: 77760 des-ede3-cbc's in 3.10s Doing des-ede3-cbc for 3s on 8192 size blocks: 29364 des-ede3-cbc's in 3.01s OpenSSL 1.0.1c 10 May 2012 built on: NetBSD 6.1_STABLE options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial) idea(int) blowfish(idx) compiler: gcc version 4.5.3 (NetBSD nb2 20111202) The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes des-ede3-cbc 586.20k 2357.87k 8246.75k 25685.88k 79916.91k NetBSD 6.99.27/bcm5862 Doing des-ede3-cbc for 3s on 16 size blocks: 111538 des-ede3-cbc's in 3.16s Doing des-ede3-cbc for 3s on 64 size blocks: 107742 des-ede3-cbc's in 3.15s Doing des-ede3-cbc for 3s on 256 size blocks: 92502 des-ede3-cbc's in 3.09s Doing des-ede3-cbc for 3s on 1024 size blocks: 73305 des-ede3-cbc's in 3.12s Doing des-ede3-cbc for 3s on 8192 size blocks: 28729 des-ede3-cbc's in 3.01s OpenSSL 1.0.1c 10 May 2012 built on: NetBSD 6.1_STABLE options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,2,int) aes(partial) idea(int) blowfish(idx) compiler: gcc version 4.5.3 (NetBSD nb2 20111202) The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes des-ede3-cbc 564.75k 2189.04k 7663.60k 24059.08k 78188.69k --chris [1] the code has yet uncommitted changes to avoid calling bus_dmamap_destroy() from interrupt context that I'm waiting to get reviewed before committing.