Re: Lightweight support for instruction RNGs

Taylor R Campbell Sun, 20 Dec 2015 16:07:47 -0800

   Date: Sun, 20 Dec 2015 17:44:35 -0500
   From: Thor Lancelot Simon <t...@panix.com>

   On Sun, Dec 20, 2015 at 10:34:29PM +0000, Taylor R Campbell wrote:
   >    Date: Sat, 19 Dec 2015 19:37:22 -0500
   >    From: Thor Lancelot Simon <t...@panix.com>
   > 
   >    I was playing with code for a RDRAND/RDSEED entropy source and it
   >    just felt like -- much like opencrypto is poorly suited for crypto
   >    via unprivileged CPU instructions -- our rndsource interface is
   >    a little too heavy for CPU RNGs implemented as instructions.
   > 
   > Why is it a little too heavy?

   There's a huge amount of locking and recordkeeping, when the entire
   point of an RNG implemented as a CPU instruction is that it's so quick
   and cheap you can use it without worrying about the cost.

What I meant is: why does our rndsource *API* necessitate a new API
rather than an improved implementation?

I have been planning to remove the locking and most of the bookkeeping
in the implementation.  It is already the caller's responsibility to
ensure exclusive access to each rndsource, so the locking arises only
because we use one global sample queue.  I think that rnd_add_data
should:

1. xor some data into a CPU-local buffer;
2. increment a count;
3. maybe stir a CPU-local pool, and maybe cv_broadcast waiters, or
schedule a softint to do these;

and do nothing else: no inter-CPU communication, no memory allocation,
no list editing or traversal, &c.

I have a draft here, using a 1600-bit pool per CPU stirred with the
Keccak permutation, plus one global pool which merges all the CPUs'
pools to generate output for rnd_extract_data:

https://www.netbsd.org/~riastradh/tmp/20150809/
(summary: https://www.netbsd.org/~riastradh/tmp/20151002/entropy.pdf)

The rndsource API is unchanged, so that it is easy to drop in the
implementation and for compatibility with existing hardware drivers.

   Even without the latter, surely there is value to the former
   [adding more mechanism for CPU RNG instructions].  I believe it
   would serve for a number of other CPUs; for example I believe the
   RNG on Octeon and its successors is instruction based, and this way
   we could have a _single_ rndsource driver for all such CPUs rather
   than many drivers.

This is an API concern.  It sounds like the operative difference of
the cpu_rng API from the rndsource API is that the cpu_rng API is
optimized for callback-only entropy sources which never sleep for I/O
or require any inter-CPU communication.  E.g., it sounds like
bcmrng(4) would satisfy this contract too.

It's not clear to me that we should assume there's only a single one
of these, or that the operator will never want to disable it (e.g., to
defend against http://blog.cr.yp.to/20140205-entropy.html).

Sources that never require any inter-CPU communication suggest that
perhaps they should be per-CPU, and have per-CPU bookkeeping just like
any other rndsource.  But I can see that the rndsource API makes it
difficult to do per-CPU rndsources with callbacks: rnd_getmore always
walks the global list of rndsources, and perhaps there could be more
than one rnd_getmore in flight at any given time.  So maybe we could
use a clearer mechanism for per-CPU rndsources.

   I was hoping someone -- anyone -- would actually look at the code
   rather than commenting without looking.  It might be best of all
   if it were you.  Please?

Apologies for my previous message's terseness.  I'm backed up on a lot
of mail.  I did skim the code before replying.  I was just hoping for
more of a side-by-side comparison of what the cpu_rng API accomplished
and why versus what would have been necessary in the rndsource API.

Re: Lightweight support for instruction RNGs

Reply via email to