Re: [swift-evolution] [Proposal] Random Unification

Jonathan Hull via swift-evolution Wed, 04 Oct 2017 03:04:47 -0700

What about the opposite then, where we internally switch to a slightly less 
random source, but keep the extra function so conscientious programmers can 
check/handle cases where there isn’t enough entropy specially if needed?


Thanks,
Jon


> On Oct 4, 2017, at 2:55 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote:
> 
> Seems like the API would be actively hiding he possibility of failure so that 
> you’d have to be in the know to prevent it. Those who don’t know about it 
> would be hunting down a ghost as they’re trying to debug, especially if their 
> program crashes rarely, stochastically, and non-reproducibly because a third 
> party library calls random() in code they can’t see. I think this makes 
> trapping the least acceptable of all options.
> On Wed, Oct 4, 2017 at 04:49 Jonathan Hull <jh...@gbis.com 
> <mailto:jh...@gbis.com>> wrote:
> @Xiaodi:  What do you think of the possibility of trapping in cases of low 
> entropy, and adding an additional global function that checks for entropy so 
> that conscientious programmers can avoid the trap and provide an alternative 
> (or error message)?
> 
> Thanks,
> Jon
> 
> 
>> On Oct 4, 2017, at 2:41 AM, Xiaodi Wu <xiaodi...@gmail.com 
>> <mailto:xiaodi...@gmail.com>> wrote:
>> 
>> 
>> On Wed, Oct 4, 2017 at 02:39 Félix Cloutier <felixclout...@icloud.com 
>> <mailto:felixclout...@icloud.com>> wrote:
>> I'm really not enthusiastic about `random() -> Self?` or `random() throws -> 
>> Self` when the only possible error is that some global object hasn't been 
>> initialized.
>> 
>> The idea of having `random` straight on integers and floats and collections 
>> was to provide a simple interface, but using a global CSPRNG for those 
>> operations comes at a significant usability cost. I think that something has 
>> to go:
>> 
>> Drop the random methods on FixedWidthInteger, FloatingPoint
>> ...or drop the CSPRNG as a default
>> Drop the optional/throws, and trap on error
>> 
>> I know I wouldn't use the `Int.random()` method if I had to unwrap every 
>> single result, when getting one non-nil result guarantees that the program 
>> won't see any other nil result again until it restarts.
>> 
>> From the perspective of an app that can be suspended and resumed at any 
>> time, “until it restarts” could be as soon as the next invocation of 
>> `Int.random()`, could it not?
>> 
>> 
>> Félix
>> 
>>> Le 3 oct. 2017 à 23:44, Jonathan Hull <jh...@gbis.com 
>>> <mailto:jh...@gbis.com>> a écrit :
>>> 
>>> I like the idea of splitting it into 2 separate “Random” proposals.
>>> 
>>> The first would have Xiaodi’s built-in CSPRNG which only has the interface:
>>> 
>>> On FixedWidthInteger:
>>>     static func random()throws -> Self
>>>     static func random(in range: ClosedRange<Self>)throws -> Self
>>> 
>>> On Double:
>>>     static func random()throws -> Double
>>>     static func random(in range: ClosedRange<Double>)throws -> Double
>>> 
>>> (Everything else we want, like shuffled(), could be built in later 
>>> proposals by calling those functions)
>>> 
>>> The other option would be to remove the ‘throws’ from the above functions 
>>> (perhaps fatalError-ing), and provide an additional function which can be 
>>> used to check that there is enough entropy (so as to avoid the crash or 
>>> fall back to a worse source when the CSPRNG is unavailable).
>>> 
>>> 
>>> 
>>> Then a second proposal would bring in the concept of RandomSources 
>>> (whatever we call them), which can return however many random bytes you ask 
>>> for… and a protocol for types which know how to initialize themselves from 
>>> those bytes.  That might be spelled like 'static func random(using: 
>>> RandomSource)->Self'.  As a convenience, the source would also be able to 
>>> create FixedWidthIntegers and Doubles (both with and without a range), and 
>>> would also have the coinFlip() and oneIn(UInt)->Bool functions. Most types 
>>> should be able to build themselves off of that.  There would be a default 
>>> source which is built from the first protocol.
>>> 
>>> I also really think we should have a concept of Repeatably-Random as a 
>>> subprotocol for the second proposal.  I see far too many shipping apps 
>>> which have bugs due to using arc4Random when they really needed a 
>>> repeatable source (e.g. patterns and lines jump around when you resize 
>>> things). If it was an easy option, people would use it when appropriate. 
>>> This would just mean a sub-protocol which has an initializer which takes a 
>>> seed, and the ability to save/restore state (similar to CGContexts).
>>> 
>>> The second proposal would also include things like shuffled() and 
>>> shuffled(using:).
>>> 
>>> Thanks,
>>> Jon
>>> 
>>> 
>>> 
>>>> On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <aalonso...@outlook.com 
>>>> <mailto:aalonso...@outlook.com>> wrote:
>>>> 
>>>> I really like the schedule here. After reading for a while, I do agree 
>>>> with Brent that stdlib should very primitive in functionality that it 
>>>> provides. I also agree that the most important part right now is designing 
>>>> the internal crypto on which the numeric types use to return their 
>>>> respected random number. On the discussion of how we should handle not 
>>>> enough entropy with the device random, from a users perspective it makes 
>>>> sense that calling .random should just give me a random number, but from a 
>>>> developers perspective I see Optional being the best choice here. While I 
>>>> think blocking could, in most cases, provide the user an easier API, we 
>>>> have to do this right and be safe here by providing a value that indicates 
>>>> that there is room for error here. As for the generator abstraction, I 
>>>> believe there should be a bare basic protocol that sets a layout for new 
>>>> generators and should be focusing on its requirements. 
>>>> 
>>>> Whether or not RandomAccessCollection and MutableCollection should get 
>>>> .random and .shuffle/.shuffled in this first proposal is completely up in 
>>>> the air for me. It makes sense, to me, to include the .random in this 
>>>> proposal and open another one .shuffle/.shuffled, but I can see arguments 
>>>> that should say we create something separate for these two, or include all 
>>>> of it in this proposal.
>>>> 
>>>> - Alejandro
>>>> 
>>>> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <xiaodi...@gmail.com 
>>>> <mailto:xiaodi...@gmail.com>>, wrote:
>>>>> 
>>>>> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <felixclout...@icloud.com 
>>>>> <mailto:felixclout...@icloud.com>> wrote:
>>>>>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <xiaodi...@gmail.com 
>>>>>> <mailto:xiaodi...@gmail.com>> a écrit :
>>>>>> 
>>>>> 
>>>>>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier 
>>>>>> <felixclout...@icloud.com <mailto:felixclout...@icloud.com>> wrote:
>>>>>> 
>>>>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a 
>>>>>> reproducible sequence, but when you use it as a CSPRNG, you typically 
>>>>>> feed entropy back into it at nondeterministic points to ensure that even 
>>>>>> if you started with a bad seed, you'll eventually get to an alright 
>>>>>> state. Unless you keep track of when entropy was mixed in and what the 
>>>>>> values were, you'll never get a reproducible CSPRNG.
>>>>>> 
>>>>>> We would give developers a false sense of security if we provided them 
>>>>>> with CSPRNG-grade algorithms that we called CSPRNGs and that they could 
>>>>>> seed themselves. Just because it says "crypto-secure" in the name 
>>>>>> doesn't mean that it'll be crypto-secure if it's seeded with time(). 
>>>>>> Therefore, "reproducible" vs "non-reproducible" looks like a good 
>>>>>> distinction to me.
>>>>>> 
>>>>>> I disagree here, in two respects:
>>>>>> 
>>>>>> First, whether or not a particular PRNG is cryptographically secure is 
>>>>>> an intrinsic property of the algorithm; whether it's "reproducible" or 
>>>>>> not is determined by the published API. In other words, the distinction 
>>>>>> between CSPRNG vs. non-CSPRNG is important to document because it's 
>>>>>> semantics that cannot be deduced by the user otherwise, and it is an 
>>>>>> important one for writing secure code because it tells you whether an 
>>>>>> attacker can predict future outputs based only on observing past 
>>>>>> outputs. "Reproducible" in the sense of seedable or not is trivially 
>>>>>> noted by inspection of the published API, and it is rather immaterial to 
>>>>>> writing secure code.
>>>>> 
>>>>> 
>>>>> Cryptographically secure is not a property that I'm comfortable applying 
>>>>> to an algorithm. You cannot say that you've made a cryptographically 
>>>>> secure thing just because you've used all the right algorithms: you also 
>>>>> have to use them right, and one of the most critical components of a 
>>>>> cryptographically secure PRNG is its seed.
>>>>> 
>>>>> A cryptographically secure algorithm isn’t sufficient, but it is 
>>>>> necessary. That’s why it’s important to mark them as such. If I'm a 
>>>>> careful developer, then it is absolutely important to me to know that I’m 
>>>>> using a PRNG with a cryptographically secure algorithm, and that the 
>>>>> particular implementation of that algorithm is correct and secure.
>>>>> 
>>>>> It is a *feature* of a lot of modern CSPRNGs that you can't seed them:
>>>>> 
>>>>> You cannot seed or add entropy to std::random_device
>>>>> 
>>>>> Although std::random_device may in practice be backed by a software 
>>>>> CSPRNG, IIUC, the intention is that it can provide access to a hardware 
>>>>> non-deterministic source when available.
>>>>> 
>>>>> You cannot seed or add entropy to CryptGenRandom
>>>>> You can only add entropy to /dev/(u)random
>>>>> You can only add entropy to BSD's arc4random
>>>>> 
>>>>> Ah, I see. I think we mean different things when we say PRNG. A PRNG is 
>>>>> an entirely deterministic algorithm; the output is non-random and the 
>>>>> algorithm itself requires no entropy. If a PRNG is seeded with a random 
>>>>> sequence of bits, its output can "appear" to be random. A CSPRNG is a 
>>>>> PRNG that fulfills certain criteria such that its output can be 
>>>>> appropriate for use in cryptographic applications in place of a truly 
>>>>> random sequence *if* the input to the CSPRNG is itself random.
>>>>> 
>>>>> The examples you give above *incorporate* a CSPRNG, environment entropy, 
>>>>> and a set of rules about when to mix in additional entropy in order to 
>>>>> produce output indistinguishable from a random sequence, but they are 
>>>>> *not* themselves really *pseudorandom* generators because they are not 
>>>>> deterministic. Not only do such sources of random numbers not require an 
>>>>> interface to allow seeding, they do not even have to be publicly 
>>>>> instantiable: Swift need only expose a single thread-safe instance (or an 
>>>>> instance per thread) of a single type that provides access to 
>>>>> CryptGenRandom/urandom/arc4random, since after all the output of multiple 
>>>>> instances of that type should be statistically indistinguishable from the 
>>>>> output of only one.
>>>>> 
>>>>> What I was trying to respond to, by contrast, is the design of a 
>>>>> hierarchy of protocols CSPRNG : PRNG (or, in Alejandro's proposal, 
>>>>> UnsafeRandomSource : RandomSource) and the appropriate APIs to expose on 
>>>>> each. This is entirely inapplicable to your examples. It stands to reason 
>>>>> that a non-instantiable source of random numbers does not require a 
>>>>> protocol of its own (a hypothetical RNG : CSPRNG), since there is no 
>>>>> reason to implement (if done correctly) more than a single publicly 
>>>>> non-instantiable singleton type that could conform to it. For that 
>>>>> matter, the concrete type itself probably doesn't need *any* public API 
>>>>> at all. Instead, extensions to standard library types such as Int that 
>>>>> implement conformance to the protocol that Alejandro names "Randomizable" 
>>>>> could call internal APIs to provide all the necessary functionality, and 
>>>>> third-party types that need to conform to "Randomizable" could then in 
>>>>> turn use `Int.random()` or `Double.random()` to implement their own 
>>>>> conformance. In fact, the concrete random number generator type doesn't 
>>>>> need to be public at all. All public interaction could be through APIs 
>>>>> such as `Int.random()`.
>>>>> 
>>>>> 
>>>>> Just because we can expose a seed interface doesn't mean we should, and 
>>>>> in this case I believe that it would go against the prime objective of 
>>>>> providing secure random numbers.
>>>>> 
>>>>> 
>>>>> If we're talking about a Swift interface to a non-deterministic source of 
>>>>> random numbers like urandom or arc4random, then, as I write above, not 
>>>>> only do I agree that it doesn't need to be seedable, it also does not 
>>>>> need to be instantiable at all, does not need to conform to a protocol 
>>>>> that specifically requires the semantics of a non-deterministic source, 
>>>>> does not need to expose any public interface whatsoever, and doesn't 
>>>>> itself even need to be public. (Does it even need to be a type, as 
>>>>> opposed to simply a free function?)
>>>>> 
>>>>> In fact, having reasoned through all of this, we can split the design 
>>>>> task into two. The most essential part, which definitely should be part 
>>>>> of the stdlib, would be an internal interface to a cryptographically 
>>>>> secure platform-specific entropy source, a public protocol named 
>>>>> something like Randomizable (to be bikeshedded), and the appropriate 
>>>>> implementations on Boolean, binary integer, and floating point types to 
>>>>> conform them to Randomizable so that users can write `Bool.random()` or 
>>>>> `Int.random()`. The second part, which can be a separate proposal or even 
>>>>> a standalone core library or third-party library, would be the protocols 
>>>>> and concrete types that implement pseudorandom number generators, 
>>>>> allowing for reproducible pseudorandom sequences. In other words, instead 
>>>>> of PRNGs and CSPRNGs being the primitives on which `Int.random()` is 
>>>>> implemented; `Int.random()` should be the standard library primitive 
>>>>> which allows PRNGs and CSPRNGs to be seeded.
>>>>>> If your attacker can observe your seeding once, chances are that they 
>>>>>> can observe your reseeding too; then, they can use their own 
>>>>>> implementation of the PRNG (whether CSPRNG or non-CSPRNG) and reproduce 
>>>>>> your pseudorandom sequence whether or not Swift exposes any particular 
>>>>>> API.
>>>>> 
>>>>> On Linux, the random devices are initially seeded with machine-specific 
>>>>> but rather invariant data that makes /dev/urandom spit out predictable 
>>>>> numbers. It is considered "seeded" after a root process writes POOL_SIZE 
>>>>> bytes to it. On most implementations, this initial seed is stored on 
>>>>> disk: when the computer shuts down, it reads POOL_SIZE bytes from 
>>>>> /dev/urandom and saves it in a file, and the contents of that file is 
>>>>> loaded back into /dev/urandom when the computer starts. A scenario where 
>>>>> someone can read that file is certainly not less likely than a scenario 
>>>>> where /dev/urandom was deleted. That doesn't mean that they have kernel 
>>>>> code execution or that they can pry into your process, but they have a 
>>>>> good shot at guessing your seed and subsequent RNG results if no stirring 
>>>>> happens.
>>>>> 
>>>>> Sorry, I don't understand what you're getting at here. Again, I'm talking 
>>>>> about deterministic algorithms, not non-deterministic sources of random 
>>>>> numbers.
>>>>> 
>>>>>> Secondly, I see no reason to justify the notion that, simply because a 
>>>>>> PRNG is cryptographically secure, we ought to hide the seeding 
>>>>>> initializer (because one has to exist internally anyway) from the 
>>>>>> public. Obviously, one use case for a deterministic PRNG is to get 
>>>>>> reproducible sequences of random-appearing values; this can be useful 
>>>>>> whether the underlying algorithm is cryptographically secure or not. 
>>>>>> There are innumerably many ways to use data generated from a CSPRNG in 
>>>>>> non-cryptographically secure ways and omitting or including a public 
>>>>>> seeding initializer does not change that; in other words, using a 
>>>>>> deterministic seed for a CSPRNG would be a bad idea in certain 
>>>>>> applications, but it's a deliberate act, and someone who would 
>>>>>> mistakenly do that is clearly incapable of *using* the output from the 
>>>>>> PRNG in a secure way either; put a third way, you would be hard pressed 
>>>>>> to find a situation where it's true that "if only Swift had not made the 
>>>>>> seeding initializer public, this author would have written secure code, 
>>>>>> but instead the only security hole that existed in the code was caused 
>>>>>> by the availability of a public seeding initializer mistakenly used." 
>>>>>> The point of having both explicitly instantiable PRNGs and a layer of 
>>>>>> simpler APIs like "Int.random()" is so that the less experienced user 
>>>>>> can get the "right thing" by default, and the experienced user can 
>>>>>> customize the behavior; any user that instantiates his or her own 
>>>>>> ChaCha20Random instance is already calling for the power user interface; 
>>>>>> it is reasonable to expose the underlying primitive operations (such as 
>>>>>> seeding) so long as there are legitimate uses for it.
>>>>> 
>>>>> Nothing prevents us from using the same algorithm for a CSPRNG that is 
>>>>> safely pre-seeded and a PRNG that people seed themselves, mind you. 
>>>>> However, especially when it comes to security, there is a strong 
>>>>> responsibility to drive developers into a pit of success: the most 
>>>>> obvious thing to do has to be the right one, and suggesting to 
>>>>> cryptographically-unaware developers that they have everything they need 
>>>>> to manage their own seed is not a step in that direction.
>>>>> 
>>>>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly 
>>>>> calling it cryptographically-secure, because it is not unless you know 
>>>>> what to do with it. It is emphatically not far-fetched to imagine a 
>>>>> developer who thinks that they can outdo the standard library by using 
>>>>> their own ChaCha20Random instance after it's been seeded with time() if 
>>>>> we let them know that it's "cryptographically secure". If you're a power 
>>>>> user and you don't like the default, known-good CSPRNG, then you're 
>>>>> hopefully good enough to know that ChaCha20 is considered a 
>>>>> cryptographically-secure algorithm without help labels from the language, 
>>>>> and you know how to operate it.
>>>>> 
>>>>>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. 
>>>>>> /dev/urandom might never run out, but it is also possible for it not to 
>>>>>> be initialized at all, as in the case of some VM setups. In some older 
>>>>>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems 
>>>>>> where it is available, it can also be deleted, since it is a file. The 
>>>>>> point is, all of these scenarios cause an error during seeding of a 
>>>>>> CSPRNG. The question is, how to proceed in the face of inability to 
>>>>>> access entropy. We must do something, because we cannot therefore return 
>>>>>> a cryptographically secure answer. Rare trapping on invocation of 
>>>>>> Int.random() or permanently waiting for a never-to-be-initialized 
>>>>>> /dev/urandom would be terrible to debug, but returning an optional or 
>>>>>> throwing all the time would be verbose. How to design this API?
>>>>> 
>>>>> If the only concern is that the system might not be initialized enough, 
>>>>> I'd say that whatever returns an instance of a global, framework-seeded 
>>>>> CSPRNG should return an Optional, and the random methods that use the 
>>>>> global CSPRNG can trap and scream that the system is not initialized 
>>>>> enough. If this is a likely error for you, you can check if the CSPRNG 
>>>>> exists or not before jumping.
>>>>> 
>>>>> Also note that there is only one system for which Swift is officially 
>>>>> distributed (Ubuntu 14.04) on which the only way to get entropy from the 
>>>>> OS is to open a random device and read from it.
>>>>> 
>>>>> Again, I'm not only talking about urandom. As far as I'm aware, every API 
>>>>> to retrieve cryptographically secure sequences of random bits on every 
>>>>> platform for which Swift is distributed can potentially return an error 
>>>>> instead of random bits. The question is, what design for our API is the 
>>>>> most sensible way to deal with this contingency? On rethinking, I do 
>>>>> believe that consistently returning an Optional is the best way to go 
>>>>> about it, allowing the user to either (a) supply a deterministic 
>>>>> fallback; (b) raise an error of their own choosing; or (c) trap--all with 
>>>>> a minimum of fuss. This seems very Swifty to me.
>>>>>  
>>>>> 
>>>>>>> * What should the default CSPRNG be? There are good arguments for using 
>>>>>>> a cryptographically secure device random. (In my proposed 
>>>>>>> implementation, for device random, I use Security.framework on Apple 
>>>>>>> platforms (because /dev/urandom is not guaranteed to be available due 
>>>>>>> to the sandbox, IIUC). On Linux platforms, I would prefer to use 
>>>>>>> getrandom() and avoid using file system APIs, but getrandom() is new 
>>>>>>> and unsupported on some versions of Ubuntu that Swift supports. This is 
>>>>>>> an issue in and of itself.) Now, a number of these facilities strictly 
>>>>>>> limit or do not guarantee availability of more than a small number of 
>>>>>>> random bytes at a time; they are recommended for seeding other PRNGs 
>>>>>>> but *not* as a routine source of random numbers. Therefore, although 
>>>>>>> device random should be available to users, it probably shouldn’t be 
>>>>>>> the default for the Swift standard library as it could have negative 
>>>>>>> consequences for the system as a whole. There follows the significant 
>>>>>>> task of implementing a CSPRNG correctly and securely for the default 
>>>>>>> PRNG.
>>>>>> 
>>>>>> Theo give a talk a few years ago 
>>>>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how 
>>>>>> these problems are approached in LibreSSL.
>>>>>> 
>>>>>> Certainly, we can learn a lot from those like Theo who've dealt with the 
>>>>>> issue. I'm not in a position to watch the talk at the moment; can you 
>>>>>> summarize what the tl;dr version of it is?
>>>>> 
>>>>> I saw it three years ago, so I don't remember all the details. The gist 
>>>>> is that:
>>>>> 
>>>>> OpenBSD's random is available from extremely early in the boot process 
>>>>> with reasonable entropy
>>>>> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which 
>>>>> doesn't actually use ARC4)
>>>>> That implementation of arc4random is good because it is fool-proof and it 
>>>>> has basically no failure mode
>>>>> Stirring is good, having multiple components take random numbers from the 
>>>>> same source probably makes results harder to guess too
>>>>> Getrandom/getentropy is in all ways better than reading from random 
>>>>> devices
>>>>> 
>>>>> Vigorously agree on all points. Thanks for the summary. 
>>>>> 
>>> 
>> 
>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Random Unification

Reply via email to