On Sun, Apr 20, 2014 at 03:18:03AM -0400, Thor Lancelot Simon wrote: > I have done some benchmarks of various cprng_fast implementations: > > arc4-mtx The libkern implementation from > netbsd-current, which uses a spin mutex to > serialize access to a single, shared arc4 > state. > > arc4-nomtx Mutex calls #ifdeffed out. What was in > NetBSD prior to 2012. This implementation > is not correct. > > arc4-percpu New implementation of cprng_fast using percpu > state and arc4 as the core stream cipher. > Uses the arc4 implementation from > sys/crypto/arc4, slightly modified to give an > entry point that skips the xor. > > hc128-percpu Same new implementation but with hc128 as the > core stream cipher. Differs from what I > posted earlier in that all use of inline > functions in the public API has been removed. > > hc128-inline Percpu iplementation I posted earlier with all > noted bugs fixed; uses inlines in header file > which expose some algorithm guts to speed up > cprng_fast32().
Three more: chacha8 Percpu with Dennis' implementation of ChaCha, 8 rounds. chacha12 12 rounds chacha20 20 rounds RESULTS kernel cpb (32 bit) 4GB (1 way) 16GB (4 ways) Scaling Factor ------ ------------ ----------- ------------- -------------- arc4-mtx 35 42.58 398.83 0.106 arc4-nomtx 24 42.12 2338.92 0.018 arc4-percpu 27 33.63 41.59 0.808 hc128-percpu 21 23.75 34.90 0.680 hc128-inline 19 22.66 31.75 0.713 chacha8 22 20.51 30.45 0.662 chacha12 24 24.87 34.32 0.724 chacha20 28 30.45 39.28 0.775 I believe ChaCha8 is suitable for our purpose: we were previously considering ciphers with, at most, 128-bit security, and even 6-round ChaCha has 139-bit strength against the best currently known attack (at present, there is no attack better than brute force on ChaCha8, and the best attack on ChaCha7 is 2^248). ChaCha8 appears to be somewhat faster than the old arc4 implementation. I propose to collapse the relevant bits of Dennis' "ccrnd" into the subr_cprng.c source file, configured for 8 rounds, and call it a day. Thor