Date: Sun, 20 Dec 2015 17:44:35 -0500 From: Thor Lancelot Simon <t...@panix.com>
On Sun, Dec 20, 2015 at 10:34:29PM +0000, Taylor R Campbell wrote: > Date: Sat, 19 Dec 2015 19:37:22 -0500 > From: Thor Lancelot Simon <t...@panix.com> > > I was playing with code for a RDRAND/RDSEED entropy source and it > just felt like -- much like opencrypto is poorly suited for crypto > via unprivileged CPU instructions -- our rndsource interface is > a little too heavy for CPU RNGs implemented as instructions. > > Why is it a little too heavy? There's a huge amount of locking and recordkeeping, when the entire point of an RNG implemented as a CPU instruction is that it's so quick and cheap you can use it without worrying about the cost. What I meant is: why does our rndsource *API* necessitate a new API rather than an improved implementation? I have been planning to remove the locking and most of the bookkeeping in the implementation. It is already the caller's responsibility to ensure exclusive access to each rndsource, so the locking arises only because we use one global sample queue. I think that rnd_add_data should: 1. xor some data into a CPU-local buffer; 2. increment a count; 3. maybe stir a CPU-local pool, and maybe cv_broadcast waiters, or schedule a softint to do these; and do nothing else: no inter-CPU communication, no memory allocation, no list editing or traversal, &c. I have a draft here, using a 1600-bit pool per CPU stirred with the Keccak permutation, plus one global pool which merges all the CPUs' pools to generate output for rnd_extract_data: https://www.netbsd.org/~riastradh/tmp/20150809/ (summary: https://www.netbsd.org/~riastradh/tmp/20151002/entropy.pdf) The rndsource API is unchanged, so that it is easy to drop in the implementation and for compatibility with existing hardware drivers. Even without the latter, surely there is value to the former [adding more mechanism for CPU RNG instructions]. I believe it would serve for a number of other CPUs; for example I believe the RNG on Octeon and its successors is instruction based, and this way we could have a _single_ rndsource driver for all such CPUs rather than many drivers. This is an API concern. It sounds like the operative difference of the cpu_rng API from the rndsource API is that the cpu_rng API is optimized for callback-only entropy sources which never sleep for I/O or require any inter-CPU communication. E.g., it sounds like bcmrng(4) would satisfy this contract too. It's not clear to me that we should assume there's only a single one of these, or that the operator will never want to disable it (e.g., to defend against http://blog.cr.yp.to/20140205-entropy.html). Sources that never require any inter-CPU communication suggest that perhaps they should be per-CPU, and have per-CPU bookkeeping just like any other rndsource. But I can see that the rndsource API makes it difficult to do per-CPU rndsources with callbacks: rnd_getmore always walks the global list of rndsources, and perhaps there could be more than one rnd_getmore in flight at any given time. So maybe we could use a clearer mechanism for per-CPU rndsources. I was hoping someone -- anyone -- would actually look at the code rather than commenting without looking. It might be best of all if it were you. Please? Apologies for my previous message's terseness. I'm backed up on a lot of mail. I did skim the code before replying. I was just hoping for more of a side-by-side comparison of what the cpu_rng API accomplished and why versus what would have been necessary in the rndsource API.