What about the opposite then, where we internally switch to a slightly less random source, but keep the extra function so conscientious programmers can check/handle cases where there isn’t enough entropy specially if needed?
Thanks, Jon > On Oct 4, 2017, at 2:55 AM, Xiaodi Wu <xiaodi...@gmail.com> wrote: > > Seems like the API would be actively hiding he possibility of failure so that > you’d have to be in the know to prevent it. Those who don’t know about it > would be hunting down a ghost as they’re trying to debug, especially if their > program crashes rarely, stochastically, and non-reproducibly because a third > party library calls random() in code they can’t see. I think this makes > trapping the least acceptable of all options. > On Wed, Oct 4, 2017 at 04:49 Jonathan Hull <jh...@gbis.com > <mailto:jh...@gbis.com>> wrote: > @Xiaodi: What do you think of the possibility of trapping in cases of low > entropy, and adding an additional global function that checks for entropy so > that conscientious programmers can avoid the trap and provide an alternative > (or error message)? > > Thanks, > Jon > > >> On Oct 4, 2017, at 2:41 AM, Xiaodi Wu <xiaodi...@gmail.com >> <mailto:xiaodi...@gmail.com>> wrote: >> >> >> On Wed, Oct 4, 2017 at 02:39 Félix Cloutier <felixclout...@icloud.com >> <mailto:felixclout...@icloud.com>> wrote: >> I'm really not enthusiastic about `random() -> Self?` or `random() throws -> >> Self` when the only possible error is that some global object hasn't been >> initialized. >> >> The idea of having `random` straight on integers and floats and collections >> was to provide a simple interface, but using a global CSPRNG for those >> operations comes at a significant usability cost. I think that something has >> to go: >> >> Drop the random methods on FixedWidthInteger, FloatingPoint >> ...or drop the CSPRNG as a default >> Drop the optional/throws, and trap on error >> >> I know I wouldn't use the `Int.random()` method if I had to unwrap every >> single result, when getting one non-nil result guarantees that the program >> won't see any other nil result again until it restarts. >> >> From the perspective of an app that can be suspended and resumed at any >> time, “until it restarts” could be as soon as the next invocation of >> `Int.random()`, could it not? >> >> >> Félix >> >>> Le 3 oct. 2017 à 23:44, Jonathan Hull <jh...@gbis.com >>> <mailto:jh...@gbis.com>> a écrit : >>> >>> I like the idea of splitting it into 2 separate “Random” proposals. >>> >>> The first would have Xiaodi’s built-in CSPRNG which only has the interface: >>> >>> On FixedWidthInteger: >>> static func random()throws -> Self >>> static func random(in range: ClosedRange<Self>)throws -> Self >>> >>> On Double: >>> static func random()throws -> Double >>> static func random(in range: ClosedRange<Double>)throws -> Double >>> >>> (Everything else we want, like shuffled(), could be built in later >>> proposals by calling those functions) >>> >>> The other option would be to remove the ‘throws’ from the above functions >>> (perhaps fatalError-ing), and provide an additional function which can be >>> used to check that there is enough entropy (so as to avoid the crash or >>> fall back to a worse source when the CSPRNG is unavailable). >>> >>> >>> >>> Then a second proposal would bring in the concept of RandomSources >>> (whatever we call them), which can return however many random bytes you ask >>> for… and a protocol for types which know how to initialize themselves from >>> those bytes. That might be spelled like 'static func random(using: >>> RandomSource)->Self'. As a convenience, the source would also be able to >>> create FixedWidthIntegers and Doubles (both with and without a range), and >>> would also have the coinFlip() and oneIn(UInt)->Bool functions. Most types >>> should be able to build themselves off of that. There would be a default >>> source which is built from the first protocol. >>> >>> I also really think we should have a concept of Repeatably-Random as a >>> subprotocol for the second proposal. I see far too many shipping apps >>> which have bugs due to using arc4Random when they really needed a >>> repeatable source (e.g. patterns and lines jump around when you resize >>> things). If it was an easy option, people would use it when appropriate. >>> This would just mean a sub-protocol which has an initializer which takes a >>> seed, and the ability to save/restore state (similar to CGContexts). >>> >>> The second proposal would also include things like shuffled() and >>> shuffled(using:). >>> >>> Thanks, >>> Jon >>> >>> >>> >>>> On Oct 3, 2017, at 9:31 PM, Alejandro Alonso <aalonso...@outlook.com >>>> <mailto:aalonso...@outlook.com>> wrote: >>>> >>>> I really like the schedule here. After reading for a while, I do agree >>>> with Brent that stdlib should very primitive in functionality that it >>>> provides. I also agree that the most important part right now is designing >>>> the internal crypto on which the numeric types use to return their >>>> respected random number. On the discussion of how we should handle not >>>> enough entropy with the device random, from a users perspective it makes >>>> sense that calling .random should just give me a random number, but from a >>>> developers perspective I see Optional being the best choice here. While I >>>> think blocking could, in most cases, provide the user an easier API, we >>>> have to do this right and be safe here by providing a value that indicates >>>> that there is room for error here. As for the generator abstraction, I >>>> believe there should be a bare basic protocol that sets a layout for new >>>> generators and should be focusing on its requirements. >>>> >>>> Whether or not RandomAccessCollection and MutableCollection should get >>>> .random and .shuffle/.shuffled in this first proposal is completely up in >>>> the air for me. It makes sense, to me, to include the .random in this >>>> proposal and open another one .shuffle/.shuffled, but I can see arguments >>>> that should say we create something separate for these two, or include all >>>> of it in this proposal. >>>> >>>> - Alejandro >>>> >>>> On Sep 27, 2017, 7:29 PM -0500, Xiaodi Wu <xiaodi...@gmail.com >>>> <mailto:xiaodi...@gmail.com>>, wrote: >>>>> >>>>> On Wed, Sep 27, 2017 at 00:18 Félix Cloutier <felixclout...@icloud.com >>>>> <mailto:felixclout...@icloud.com>> wrote: >>>>>> Le 26 sept. 2017 à 16:14, Xiaodi Wu <xiaodi...@gmail.com >>>>>> <mailto:xiaodi...@gmail.com>> a écrit : >>>>>> >>>>> >>>>>> On Tue, Sep 26, 2017 at 11:26 AM, Félix Cloutier >>>>>> <felixclout...@icloud.com <mailto:felixclout...@icloud.com>> wrote: >>>>>> >>>>>> It's possible to use a CSPRNG-grade algorithm and seed it once to get a >>>>>> reproducible sequence, but when you use it as a CSPRNG, you typically >>>>>> feed entropy back into it at nondeterministic points to ensure that even >>>>>> if you started with a bad seed, you'll eventually get to an alright >>>>>> state. Unless you keep track of when entropy was mixed in and what the >>>>>> values were, you'll never get a reproducible CSPRNG. >>>>>> >>>>>> We would give developers a false sense of security if we provided them >>>>>> with CSPRNG-grade algorithms that we called CSPRNGs and that they could >>>>>> seed themselves. Just because it says "crypto-secure" in the name >>>>>> doesn't mean that it'll be crypto-secure if it's seeded with time(). >>>>>> Therefore, "reproducible" vs "non-reproducible" looks like a good >>>>>> distinction to me. >>>>>> >>>>>> I disagree here, in two respects: >>>>>> >>>>>> First, whether or not a particular PRNG is cryptographically secure is >>>>>> an intrinsic property of the algorithm; whether it's "reproducible" or >>>>>> not is determined by the published API. In other words, the distinction >>>>>> between CSPRNG vs. non-CSPRNG is important to document because it's >>>>>> semantics that cannot be deduced by the user otherwise, and it is an >>>>>> important one for writing secure code because it tells you whether an >>>>>> attacker can predict future outputs based only on observing past >>>>>> outputs. "Reproducible" in the sense of seedable or not is trivially >>>>>> noted by inspection of the published API, and it is rather immaterial to >>>>>> writing secure code. >>>>> >>>>> >>>>> Cryptographically secure is not a property that I'm comfortable applying >>>>> to an algorithm. You cannot say that you've made a cryptographically >>>>> secure thing just because you've used all the right algorithms: you also >>>>> have to use them right, and one of the most critical components of a >>>>> cryptographically secure PRNG is its seed. >>>>> >>>>> A cryptographically secure algorithm isn’t sufficient, but it is >>>>> necessary. That’s why it’s important to mark them as such. If I'm a >>>>> careful developer, then it is absolutely important to me to know that I’m >>>>> using a PRNG with a cryptographically secure algorithm, and that the >>>>> particular implementation of that algorithm is correct and secure. >>>>> >>>>> It is a *feature* of a lot of modern CSPRNGs that you can't seed them: >>>>> >>>>> You cannot seed or add entropy to std::random_device >>>>> >>>>> Although std::random_device may in practice be backed by a software >>>>> CSPRNG, IIUC, the intention is that it can provide access to a hardware >>>>> non-deterministic source when available. >>>>> >>>>> You cannot seed or add entropy to CryptGenRandom >>>>> You can only add entropy to /dev/(u)random >>>>> You can only add entropy to BSD's arc4random >>>>> >>>>> Ah, I see. I think we mean different things when we say PRNG. A PRNG is >>>>> an entirely deterministic algorithm; the output is non-random and the >>>>> algorithm itself requires no entropy. If a PRNG is seeded with a random >>>>> sequence of bits, its output can "appear" to be random. A CSPRNG is a >>>>> PRNG that fulfills certain criteria such that its output can be >>>>> appropriate for use in cryptographic applications in place of a truly >>>>> random sequence *if* the input to the CSPRNG is itself random. >>>>> >>>>> The examples you give above *incorporate* a CSPRNG, environment entropy, >>>>> and a set of rules about when to mix in additional entropy in order to >>>>> produce output indistinguishable from a random sequence, but they are >>>>> *not* themselves really *pseudorandom* generators because they are not >>>>> deterministic. Not only do such sources of random numbers not require an >>>>> interface to allow seeding, they do not even have to be publicly >>>>> instantiable: Swift need only expose a single thread-safe instance (or an >>>>> instance per thread) of a single type that provides access to >>>>> CryptGenRandom/urandom/arc4random, since after all the output of multiple >>>>> instances of that type should be statistically indistinguishable from the >>>>> output of only one. >>>>> >>>>> What I was trying to respond to, by contrast, is the design of a >>>>> hierarchy of protocols CSPRNG : PRNG (or, in Alejandro's proposal, >>>>> UnsafeRandomSource : RandomSource) and the appropriate APIs to expose on >>>>> each. This is entirely inapplicable to your examples. It stands to reason >>>>> that a non-instantiable source of random numbers does not require a >>>>> protocol of its own (a hypothetical RNG : CSPRNG), since there is no >>>>> reason to implement (if done correctly) more than a single publicly >>>>> non-instantiable singleton type that could conform to it. For that >>>>> matter, the concrete type itself probably doesn't need *any* public API >>>>> at all. Instead, extensions to standard library types such as Int that >>>>> implement conformance to the protocol that Alejandro names "Randomizable" >>>>> could call internal APIs to provide all the necessary functionality, and >>>>> third-party types that need to conform to "Randomizable" could then in >>>>> turn use `Int.random()` or `Double.random()` to implement their own >>>>> conformance. In fact, the concrete random number generator type doesn't >>>>> need to be public at all. All public interaction could be through APIs >>>>> such as `Int.random()`. >>>>> >>>>> >>>>> Just because we can expose a seed interface doesn't mean we should, and >>>>> in this case I believe that it would go against the prime objective of >>>>> providing secure random numbers. >>>>> >>>>> >>>>> If we're talking about a Swift interface to a non-deterministic source of >>>>> random numbers like urandom or arc4random, then, as I write above, not >>>>> only do I agree that it doesn't need to be seedable, it also does not >>>>> need to be instantiable at all, does not need to conform to a protocol >>>>> that specifically requires the semantics of a non-deterministic source, >>>>> does not need to expose any public interface whatsoever, and doesn't >>>>> itself even need to be public. (Does it even need to be a type, as >>>>> opposed to simply a free function?) >>>>> >>>>> In fact, having reasoned through all of this, we can split the design >>>>> task into two. The most essential part, which definitely should be part >>>>> of the stdlib, would be an internal interface to a cryptographically >>>>> secure platform-specific entropy source, a public protocol named >>>>> something like Randomizable (to be bikeshedded), and the appropriate >>>>> implementations on Boolean, binary integer, and floating point types to >>>>> conform them to Randomizable so that users can write `Bool.random()` or >>>>> `Int.random()`. The second part, which can be a separate proposal or even >>>>> a standalone core library or third-party library, would be the protocols >>>>> and concrete types that implement pseudorandom number generators, >>>>> allowing for reproducible pseudorandom sequences. In other words, instead >>>>> of PRNGs and CSPRNGs being the primitives on which `Int.random()` is >>>>> implemented; `Int.random()` should be the standard library primitive >>>>> which allows PRNGs and CSPRNGs to be seeded. >>>>>> If your attacker can observe your seeding once, chances are that they >>>>>> can observe your reseeding too; then, they can use their own >>>>>> implementation of the PRNG (whether CSPRNG or non-CSPRNG) and reproduce >>>>>> your pseudorandom sequence whether or not Swift exposes any particular >>>>>> API. >>>>> >>>>> On Linux, the random devices are initially seeded with machine-specific >>>>> but rather invariant data that makes /dev/urandom spit out predictable >>>>> numbers. It is considered "seeded" after a root process writes POOL_SIZE >>>>> bytes to it. On most implementations, this initial seed is stored on >>>>> disk: when the computer shuts down, it reads POOL_SIZE bytes from >>>>> /dev/urandom and saves it in a file, and the contents of that file is >>>>> loaded back into /dev/urandom when the computer starts. A scenario where >>>>> someone can read that file is certainly not less likely than a scenario >>>>> where /dev/urandom was deleted. That doesn't mean that they have kernel >>>>> code execution or that they can pry into your process, but they have a >>>>> good shot at guessing your seed and subsequent RNG results if no stirring >>>>> happens. >>>>> >>>>> Sorry, I don't understand what you're getting at here. Again, I'm talking >>>>> about deterministic algorithms, not non-deterministic sources of random >>>>> numbers. >>>>> >>>>>> Secondly, I see no reason to justify the notion that, simply because a >>>>>> PRNG is cryptographically secure, we ought to hide the seeding >>>>>> initializer (because one has to exist internally anyway) from the >>>>>> public. Obviously, one use case for a deterministic PRNG is to get >>>>>> reproducible sequences of random-appearing values; this can be useful >>>>>> whether the underlying algorithm is cryptographically secure or not. >>>>>> There are innumerably many ways to use data generated from a CSPRNG in >>>>>> non-cryptographically secure ways and omitting or including a public >>>>>> seeding initializer does not change that; in other words, using a >>>>>> deterministic seed for a CSPRNG would be a bad idea in certain >>>>>> applications, but it's a deliberate act, and someone who would >>>>>> mistakenly do that is clearly incapable of *using* the output from the >>>>>> PRNG in a secure way either; put a third way, you would be hard pressed >>>>>> to find a situation where it's true that "if only Swift had not made the >>>>>> seeding initializer public, this author would have written secure code, >>>>>> but instead the only security hole that existed in the code was caused >>>>>> by the availability of a public seeding initializer mistakenly used." >>>>>> The point of having both explicitly instantiable PRNGs and a layer of >>>>>> simpler APIs like "Int.random()" is so that the less experienced user >>>>>> can get the "right thing" by default, and the experienced user can >>>>>> customize the behavior; any user that instantiates his or her own >>>>>> ChaCha20Random instance is already calling for the power user interface; >>>>>> it is reasonable to expose the underlying primitive operations (such as >>>>>> seeding) so long as there are legitimate uses for it. >>>>> >>>>> Nothing prevents us from using the same algorithm for a CSPRNG that is >>>>> safely pre-seeded and a PRNG that people seed themselves, mind you. >>>>> However, especially when it comes to security, there is a strong >>>>> responsibility to drive developers into a pit of success: the most >>>>> obvious thing to do has to be the right one, and suggesting to >>>>> cryptographically-unaware developers that they have everything they need >>>>> to manage their own seed is not a step in that direction. >>>>> >>>>> I'm not opposed to a ChaCha20Random type; I'm opposed to explicitly >>>>> calling it cryptographically-secure, because it is not unless you know >>>>> what to do with it. It is emphatically not far-fetched to imagine a >>>>> developer who thinks that they can outdo the standard library by using >>>>> their own ChaCha20Random instance after it's been seeded with time() if >>>>> we let them know that it's "cryptographically secure". If you're a power >>>>> user and you don't like the default, known-good CSPRNG, then you're >>>>> hopefully good enough to know that ChaCha20 is considered a >>>>> cryptographically-secure algorithm without help labels from the language, >>>>> and you know how to operate it. >>>>> >>>>>> I'm fully aware of the myths surrounding /dev/urandom and /dev/random. >>>>>> /dev/urandom might never run out, but it is also possible for it not to >>>>>> be initialized at all, as in the case of some VM setups. In some older >>>>>> versions of iOS, /dev/[u]random is reportedly sandboxed out. On systems >>>>>> where it is available, it can also be deleted, since it is a file. The >>>>>> point is, all of these scenarios cause an error during seeding of a >>>>>> CSPRNG. The question is, how to proceed in the face of inability to >>>>>> access entropy. We must do something, because we cannot therefore return >>>>>> a cryptographically secure answer. Rare trapping on invocation of >>>>>> Int.random() or permanently waiting for a never-to-be-initialized >>>>>> /dev/urandom would be terrible to debug, but returning an optional or >>>>>> throwing all the time would be verbose. How to design this API? >>>>> >>>>> If the only concern is that the system might not be initialized enough, >>>>> I'd say that whatever returns an instance of a global, framework-seeded >>>>> CSPRNG should return an Optional, and the random methods that use the >>>>> global CSPRNG can trap and scream that the system is not initialized >>>>> enough. If this is a likely error for you, you can check if the CSPRNG >>>>> exists or not before jumping. >>>>> >>>>> Also note that there is only one system for which Swift is officially >>>>> distributed (Ubuntu 14.04) on which the only way to get entropy from the >>>>> OS is to open a random device and read from it. >>>>> >>>>> Again, I'm not only talking about urandom. As far as I'm aware, every API >>>>> to retrieve cryptographically secure sequences of random bits on every >>>>> platform for which Swift is distributed can potentially return an error >>>>> instead of random bits. The question is, what design for our API is the >>>>> most sensible way to deal with this contingency? On rethinking, I do >>>>> believe that consistently returning an Optional is the best way to go >>>>> about it, allowing the user to either (a) supply a deterministic >>>>> fallback; (b) raise an error of their own choosing; or (c) trap--all with >>>>> a minimum of fuss. This seems very Swifty to me. >>>>> >>>>> >>>>>>> * What should the default CSPRNG be? There are good arguments for using >>>>>>> a cryptographically secure device random. (In my proposed >>>>>>> implementation, for device random, I use Security.framework on Apple >>>>>>> platforms (because /dev/urandom is not guaranteed to be available due >>>>>>> to the sandbox, IIUC). On Linux platforms, I would prefer to use >>>>>>> getrandom() and avoid using file system APIs, but getrandom() is new >>>>>>> and unsupported on some versions of Ubuntu that Swift supports. This is >>>>>>> an issue in and of itself.) Now, a number of these facilities strictly >>>>>>> limit or do not guarantee availability of more than a small number of >>>>>>> random bytes at a time; they are recommended for seeding other PRNGs >>>>>>> but *not* as a routine source of random numbers. Therefore, although >>>>>>> device random should be available to users, it probably shouldn’t be >>>>>>> the default for the Swift standard library as it could have negative >>>>>>> consequences for the system as a whole. There follows the significant >>>>>>> task of implementing a CSPRNG correctly and securely for the default >>>>>>> PRNG. >>>>>> >>>>>> Theo give a talk a few years ago >>>>>> <https://www.youtube.com/watch?v=aWmLWx8ut20> on randomness and how >>>>>> these problems are approached in LibreSSL. >>>>>> >>>>>> Certainly, we can learn a lot from those like Theo who've dealt with the >>>>>> issue. I'm not in a position to watch the talk at the moment; can you >>>>>> summarize what the tl;dr version of it is? >>>>> >>>>> I saw it three years ago, so I don't remember all the details. The gist >>>>> is that: >>>>> >>>>> OpenBSD's random is available from extremely early in the boot process >>>>> with reasonable entropy >>>>> LibreSSL includes OpenBSD's arc4random, and it's a "good" PRNG (which >>>>> doesn't actually use ARC4) >>>>> That implementation of arc4random is good because it is fool-proof and it >>>>> has basically no failure mode >>>>> Stirring is good, having multiple components take random numbers from the >>>>> same source probably makes results harder to guess too >>>>> Getrandom/getentropy is in all ways better than reading from random >>>>> devices >>>>> >>>>> Vigorously agree on all points. Thanks for the summary. >>>>> >>> >> >
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution