Re: [swift-evolution] [Proposal] Random Unification

Alejandro Alonso via swift-evolution Tue, 03 Oct 2017 21:32:55 -0700

I really like the schedule here. After reading for a while, I do agree with 
Brent that stdlib should very primitive in functionality that it provides. I 
also agree that the most important part right now is designing the internal 
crypto on which the numeric types use to return their respected random number. 
On the discussion of how we should handle not enough entropy with the device 
random, from a users perspective it makes sense that calling .random should 
just give me a random number, but from a developers perspective I see Optional 
being the best choice here. While I think blocking could, in most cases, 
provide the user an easier API, we have to do this right and be safe here by 
providing a value that indicates that there is room for error here. As for the 
generator abstraction, I believe there should be a bare basic protocol that 
sets a layout for new generators and should be focusing on its requirements.


Whether or not RandomAccessCollection and MutableCollection should get .random 
and .shuffle/.shuffled in this first proposal is completely up in the air for 
me. It makes sense, to me, to include the .random in this proposal and open 
another one .shuffle/.shuffled, but I can see arguments that should say we 
create something separate for these two, or include all of it in this proposal.

- Alejandro

On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <xiaodi...@gmail.com>, wrote:

On Wed, Sep 27, 2017 at 00:18 Félix Cloutier 
<felixclout...@icloud.com<mailto:felixclout...@icloud.com>> wrote:
Le 26 sept. 2017 à 16:14, Xiaodi Wu 
<xiaodi...@gmail.com<mailto:xiaodi...@gmail.com>> a écrit :

On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier 
<felixclout...@icloud.com<mailto:felixclout...@icloud.com>> wrote:

It's possible to use a CSPRNG-grade algorithm and seed it once to get a 
reproducible sequence, but when you use it as a CSPRNG, you typically feed 
entropy back into it at nondeterministic points to ensure that even if you 
started with a bad seed, you'll eventually get to an alright state. Unless you 
keep track of when entropy was mixed in and what the values were, you'll never 
get a reproducible CSPRNG.

We would give developers a false sense of security if we provided them with 
CSPRNG-grade algorithms that we called CSPRNGs and that they could seed 
themselves. Just because it says "crypto-secure" in the name doesn't mean that 
it'll be crypto-secure if it's seeded with time(). Therefore, "reproducible" vs 
"non-reproducible" looks like a good distinction to me.

I disagree here, in two respects:

First, whether or not a particular PRNG is cryptographically secure is an 
intrinsic property of the algorithm; whether it's "reproducible" or not is 
determined by the published API. In other words, the distinction between CSPRNG 
vs. non-CSPRNG is important to document because it's semantics that cannot be 
deduced by the user otherwise, and it is an important one for writing secure 
code because it tells you whether an attacker can predict future outputs based 
only on observing past outputs. "Reproducible" in the sense of seedable or not 
is trivially noted by inspection of the published API, and it is rather 
immaterial to writing secure code.

Cryptographically secure is not a property that I'm comfortable applying to an 
algorithm. You cannot say that you've made a cryptographically secure thing 
just because you've used all the right algorithms: you also have to use them 
right, and one of the most critical components of a cryptographically secure 
PRNG is its seed.

A cryptographically secure algorithm isn’t sufficient, but it is necessary. 
That’s why it’s important to mark them as such. If I'm a careful developer, 
then it is absolutely important to me to know that I’m using a PRNG with a 
cryptographically secure algorithm, and that the particular implementation of 
that algorithm is correct and secure.

It is a *feature* of a lot of modern CSPRNGs that you can't seed them:


  *   You cannot seed or add entropy to std::random_device

Although std::random_device may in practice be backed by a software CSPRNG, 
IIUC, the intention is that it can provide access to a hardware 
non-deterministic source when available.


  *   You cannot seed or add entropy to CryptGenRandom
  *   You can only add entropy to /dev/(u)random
  *   You can only add entropy to BSD's arc4random

Ah, I see. I think we mean different things when we say PRNG. A PRNG is an 
entirely deterministic algorithm; the output is non-random and the algorithm 
itself requires no entropy. If a PRNG is seeded with a random sequence of bits, 
its output can "appear" to be random. A CSPRNG is a PRNG that fulfills certain 
criteria such that its output can be appropriate for use in cryptographic 
applications in place of a truly random sequence *if* the input to the CSPRNG 
is itself random.

The examples you give above *incorporate* a CSPRNG, environment entropy, and a 
set of rules about when to mix in additional entropy in order to produce output 
indistinguishable from a random sequence, but they are *not* themselves really 
*pseudorandom* generators because they are not deterministic. Not only do such 
sources of random numbers not require an interface to allow seeding, they do 
not even have to be publicly instantiable: Swift need only expose a single 
thread-safe instance (or an instance per thread) of a single type that provides 
access to CryptGenRandom/urandom/arc4random, since after all the output of 
multiple instances of that type should be statistically indistinguishable from 
the output of only one.

What I was trying to respond to, by contrast, is the design of a hierarchy of 
protocols CSPRNG : PRNG (or, in Alejandro's proposal, UnsafeRandomSource : 
RandomSource) and the appropriate APIs to expose on each. This is entirely 
inapplicable to your examples. It stands to reason that a non-instantiable 
source of random numbers does not require a protocol of its own (a hypothetical 
RNG : CSPRNG), since there is no reason to implement (if done correctly) more 
than a single publicly non-instantiable singleton type that could conform to 
it. For that matter, the concrete type itself probably doesn't need *any* 
public API at all. Instead, extensions to standard library types such as Int 
that implement conformance to the protocol that Alejandro names "Randomizable" 
could call internal APIs to provide all the necessary functionality, and 
third-party types that need to conform to "Randomizable" could then in turn use 
`Int.random()` or `Double.random()` to implement their own conformance. In 
fact, the concrete random number generator type doesn't need to be public at 
all. All public interaction could be through APIs such as `Int.random()`.


Just because we can expose a seed interface doesn't mean we should, and in this 
case I believe that it would go against the prime objective of providing secure 
random numbers.


If we're talking about a Swift interface to a non-deterministic source of 
random numbers like urandom or arc4random, then, as I write above, not only do 
I agree that it doesn't need to be seedable, it also does not need to be 
instantiable at all, does not need to conform to a protocol that specifically 
requires the semantics of a non-deterministic source, does not need to expose 
any public interface whatsoever, and doesn't itself even need to be public. 
(Does it even need to be a type, as opposed to simply a free function?)

In fact, having reasoned through all of this, we can split the design task into 
two. The most essential part, which definitely should be part of the stdlib, 
would be an internal interface to a cryptographically secure platform-specific 
entropy source, a public protocol named something like Randomizable (to be 
bikeshedded), and the appropriate implementations on Boolean, binary integer, 
and floating point types to conform them to Randomizable so that users can 
write `Bool.random()` or `Int.random()`. The second part, which can be a 
separate proposal or even a standalone core library or third-party library, 
would be the protocols and concrete types that implement pseudorandom number 
generators, allowing for reproducible pseudorandom sequences. In other words, 
instead of PRNGs and CSPRNGs being the primitives on which `Int.random()` is 
implemented; `Int.random()` should be the standard library primitive which 
allows PRNGs and CSPRNGs to be seeded.
If your attacker can observe your seeding once, chances are that they can 
observe your reseeding too; then, they can use their own implementation of the 
PRNG (whether CSPRNG or non-CSPRNG) and reproduce your pseudorandom sequence 
whether or not Swift exposes any particular API.

On Linux, the random devices are initially seeded with machine-specific but 
rather invariant data that makes /dev/urandom spit out predictable numbers. It 
is considered "seeded" after a root process writes POOL_SIZE bytes to it. On 
most implementations, this initial seed is stored on disk: when the computer 
shuts down, it reads POOL_SIZE bytes from /dev/urandom and saves it in a file, 
and the contents of that file is loaded back into /dev/urandom when the 
computer starts. A scenario where someone can read that file is certainly not 
less likely than a scenario where /dev/urandom was deleted. That doesn't mean 
that they have kernel code execution or that they can pry into your process, 
but they have a good shot at guessing your seed and subsequent RNG results if 
no stirring happens.

Sorry, I don't understand what you're getting at here. Again, I'm talking about 
deterministic algorithms, not non-deterministic sources of random numbers.

Secondly, I see no reason to justify the notion that, simply because a PRNG is 
cryptographically secure, we ought to hide the seeding initializer (because one 
has to exist internally anyway) from the public. Obviously, one use case for a 
deterministic PRNG is to get reproducible sequences of random-appearing values; 
this can be useful whether the underlying algorithm is cryptographically secure 
or not. There are innumerably many ways to use data generated from a CSPRNG in 
non-cryptographically secure ways and omitting or including a public seeding 
initializer does not change that; in other words, using a deterministic seed 
for a CSPRNG would be a bad idea in certain applications, but it's a deliberate 
act, and someone who would mistakenly do that is clearly incapable of *using* 
the output from the PRNG in a secure way either; put a third way, you would be 
hard pressed to find a situation where it's true that "if only Swift had not 
made the seeding initializer public, this author would have written secure 
code, but instead the only security hole that existed in the code was caused by 
the availability of a public seeding initializer mistakenly used." The point of 
having both explicitly instantiable PRNGs and a layer of simpler APIs like 
"Int.random()" is so that the less experienced user can get the "right thing" 
by default, and the experienced user can customize the behavior; any user that 
instantiates his or her own ChaCha20Random instance is already calling for the 
power user interface; it is reasonable to expose the underlying primitive 
operations (such as seeding) so long as there are legitimate uses for it.

Nothing prevents us from using the same algorithm for a CSPRNG that is safely 
pre-seeded and a PRNG that people seed themselves, mind you. However, 
especially when it comes to security, there is a strong responsibility to drive 
developers into a pit of success: the most obvious thing to do has to be the 
right one, and suggesting to cryptographically-unaware developers that they 
have everything they need to manage their own seed is not a step in that 
direction.

I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly calling it 
cryptographically-secure, because it is not unless you know what to do with it. 
It is emphatically not far-fetched to imagine a developer who thinks that they 
can outdo the standard library by using their own ChaCha20Random instance after 
it's been seeded with time() if we let them know that it's "cryptographically 
secure". If you're a power user and you don't like the default, known-good 
CSPRNG, then you're hopefully good enough to know that ChaCha20 is considered a 
cryptographically-secure algorithm without help labels from the language, and 
you know how to operate it.

I'm fully aware of the myths surrounding /dev/urandom and /dev/random. 
/dev/urandom might never run out, but it is also possible for it not to be 
initialized at all, as in the case of some VM setups. In some older versions of 
iOS, /dev/[u]random is reportedly sandboxed out. On systems where it is 
available, it can also be deleted, since it is a file. The point is, all of 
these scenarios cause an error during seeding of a CSPRNG. The question is, how 
to proceed in the face of inability to access entropy. We must do something, 
because we cannot therefore return a cryptographically secure answer. Rare 
trapping on invocation of Int.random() or permanently waiting for a 
never-to-be-initialized /dev/urandom would be terrible to debug, but returning 
an optional or throwing all the time would be verbose. How to design this API?

If the only concern is that the system might not be initialized enough, I'd say 
that whatever returns an instance of a global, framework-seeded CSPRNG should 
return an Optional, and the random methods that use the global CSPRNG can trap 
and scream that the system is not initialized enough. If this is a likely error 
for you, you can check if the CSPRNG exists or not before jumping.

Also note that there is only one system for which Swift is officially 
distributed (Ubuntu 14.04) on which the only way to get entropy from the OS is 
to open a random device and read from it.

Again, I'm not only talking about urandom. As far as I'm aware, every API to 
retrieve cryptographically secure sequences of random bits on every platform 
for which Swift is distributed can potentially return an error instead of 
random bits. The question is, what design for our API is the most sensible way 
to deal with this contingency? On rethinking, I do believe that consistently 
returning an Optional is the best way to go about it, allowing the user to 
either (a) supply a deterministic fallback; (b) raise an error of their own 
choosing; or (c) trap--all with a minimum of fuss. This seems very Swifty to me.


* What should the default CSPRNG be? There are good arguments for using a 
cryptographically secure device random. (In my proposed implementation, for 
device random, I use Security.framework on Apple platforms (because 
/dev/urandom is not guaranteed to be available due to the sandbox, IIUC). On 
Linux platforms, I would prefer to use getrandom() and avoid using file system 
APIs, but getrandom() is new and unsupported on some versions of Ubuntu that 
Swift supports. This is an issue in and of itself.) Now, a number of these 
facilities strictly limit or do not guarantee availability of more than a small 
number of random bytes at a time; they are recommended for seeding other PRNGs 
but *not* as a routine source of random numbers. Therefore, although device 
random should be available to users, it probably shouldn’t be the default for 
the Swift standard library as it could have negative consequences for the 
system as a whole. There follows the significant task of implementing a CSPRNG 
correctly and securely for the default PRNG.

Theo give a talk a few years ago<https://www.youtube.com/watch?v=aWmLWx8ut20> 
on randomness and how these problems are approached in LibreSSL.

Certainly, we can learn a lot from those like Theo who've dealt with the issue. 
I'm not in a position to watch the talk at the moment; can you summarize what 
the tl;dr version of it is?

I saw it three years ago, so I don't remember all the details. The gist is that:


  *   OpenBSD's random is available from extremely early in the boot process 
with reasonable entropy

  *   LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which 
doesn't actually use ARC4)
  *   That implementation of arc4random is good because it is fool-proof and it 
has basically no failure mode
  *   Stirring is good, having multiple components take random numbers from the 
same source probably makes results harder to guess too
  *   Getrandom/getentropy is in all ways better than reading from random 
devices

Vigorously agree on all points. Thanks for the summary.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Random Unification

Reply via email to