On Fri, Nov 24, 2017 at 10:59 PM, TellowKrinkle <tellowkrin...@gmail.com> wrote:
> So why is it more important for the random method on a collection to have > a special method that guarantees a discrete uniform distribution than it is > for an Int? If you’re going to split on guaranteed-discrete-uniform vs > maybe-discrete-uniform, why not split on discrete-uniform vs > not-discrete-uniform (note: I would not want either of these)? > > Why not just let everything be maybe-discrete-uniform and then specify: > - Things involving discrete sets (including collections and ranges of > discrete values like ints) return a discrete uniform distribution > - Things involving continuous ranges (including ranges of floating-point > types) return a continuous uniform distribution > I don’t really see the point in differentiating between a discrete and > continuous distribution, since it makes no sense to use a continuous > distribution for things that are discrete, and it also makes no sense to > use a discrete distribution for things that are continuous. > One of the arguments that others have raised against the proposed `Randomizable` protocol and the static method `random` is precisely this: that static `random` guarantees no semantics about the nature of the random value, including the distribution from which it is drawn. I agree with this criticism; so you are correct: I do not want a "maybe-discrete-uniform" method at all, let alone one that shares its name with methods that do guarantee a particular distribution. As for optional vs non-optional, I’d say this is similar to conforming to > RawRepresentable (where you can implement its `init?(rawValue:)` with an > `init(rawValue:)` if your type doesn’t ever fail to initialize) where > you’re simply indicating that for whatever reason, your type is less likely > to fail than whatever the most likely to fail type is. > > Personally, I don’t care whether or not `Int.random` stays, but it’s > functionally identical to `Int.random(in:)` with a default argument so it > doesn’t make much of a difference for this decision since removing it > wouldn’t affect the issue you’re having between `Int.random(in:)` and > `Collection.random`. > There is certainly a difference. `Int.random(in:)` is failable, like `Collection.random` is failable, because it is selecting one of a set of values and that set may be empty. `Int.random` is not failable. Moreover, as I wrote earlier, I'm concerned about this multiplication of methods that do in fact do the same thing. It is unclear to me in what way `Int.random(in: [1, 2, 3])` differs from `[1, 2, 3].random`. If there is no semantic distinction, there should only be one facility. If there is a semantic distinction, then there should be two facilities with distinct names. In either case, there should not be two facilities with the same name. 2017/11/24 21:39、Xiaodi Wu <xiaodi...@gmail.com>のメール: > > On Fri, Nov 24, 2017 at 9:05 PM, TellowKrinkle <tellowkrin...@gmail.com> > wrote: > >> You say that all the `.random`s have different semantics, but to me (at >> least), they are all very similar. >> > > Of course they are _similar_: this is precisely why it's so important to > be clear about the differences in the naming. > > >> All the methods can be summarized as selecting a single random element >> from a collection >> `[0, 2, 3].random` selects a single element from the given collection >> `Int.random(in: 0…8)` selects a single element from the given range >> `Int.random` has no range, but selects a single element from the >> collection of all ints (equivalent to if the above method had a default >> value for its range) >> So to me these are all doing the same operation, just with different >> types of inputs >> > > There are many subtle but important differences. For example: > > `[1, 2, 3].random` is a sampling operation based on a discrete uniform > distribution. All operations that choose an element from a Collection would > behave similarly: that is, instance `random` guarantees sampling based on a > discrete uniform distribution. It does so happen that `Int.random` gives > values in a discrete uniform distribution. However, `Float.random` most > certainly does not: it would sample from a _continuous_ uniform > distribution. In general, static `random` does not guarantee any particular > distribution at all. This is a huge semantic distinction. > > Static `random` (e.g., `Int.random`) will always return a value, whereas > instance `random` (e.g., `[1, 2, 3].random`) might not. This is because all > types that implement static `random` must be instantiable, whereas > collections can be empty. One might conclude that it makes sense for static > `random` to be of type `T`, whereas instance `random` would be most > fittingly of type `T?`. However, because they're both named "random", > people have been misled into thinking that they're in fact the same > operation and must therefore have the same return type. Alejandro has > argued that `[1, 2, 3].random` should be of type `T` *because* it would not > be ergonomic for `Int.random` to be of type `T?`. Meanwhile, others have > argued that, because `[].random` should be failable, `Int.random` should be > as well. This perceived need for the two distinct facilities to return the > same type is completely due to them having the same proposed name. However, > as described above, one is failable and the other is not *because of their > differing semantics*. > > Meanwhile, we have had a debate as to whether `random` should be spelled > as a property or a function. Alejandro has argued that `random` is like > `first` or `last` and is a property of a collection, while others have > argued that `Int.random()` should be spelled like a function because it > instantiates a different value each time. Notionally, of course, instance > `random` selects one already-existing element from a collection, whereas > static `random` creates a new value that doesn't exist yet and truly could > be considered like a factory method. However, because again they've both > been proposed to have the name "random", people are using arguments about > one type of "random" to decide questions of syntax for the other type of > "random". > > All of this goes away when we clarify that these two are distinct > facilities: they have different semantics. Of course, elsewhere, I've > advocated for `Int.random` to be removed altogether due to large potential > for incorrect use. If so, then that's one fewer "random" to be confused > with one another. > > >> 2017/11/24 20:07、Alejandro Alonso <aalonso...@outlook.com>のメール: >> >> >> - Alejandro >> >> ---------- Forwarded message ---------- >> *From:* Xiaodi Wu <xiaodi...@gmail.com> >> *Date:* Nov 24, 2017, 3:05 PM -0600 >> *To:* Alejandro Alonso <aalonso...@outlook.com> >> *Cc:* Brent Royal-Gordon <br...@architechies.com>, Steve Canon via >> swift-evolution <swift-evolution@swift.org> >> *Subject:* Re: [swift-evolution] [Proposal] Random Unification >> >> On Fri, Nov 24, 2017 at 2:55 PM, Alejandro Alonso <aalonso...@outlook.com >> > wrote: >> >>> Regarding naming too many things “random”, I’ve talked to many >>> developers on my end and they all don’t find it confusing. This proposal is >>> aimed to make it obvious what the operation is doing when regarding random. >>> I still agree that the proposed solution does just that and in practice >>> feels good to write. >>> >> >> I must disagree quite strongly here. The various facilities you name >> "random" have different semantics, and differences in semantics should be >> reflected in differences in names. It doesn't matter that some people don't >> find it confusing; it is objectively the case that you have named multiple >> distinct facilities with the same name, which leads to confusion. I, for >> one, get confused, and you can see on this list that people are using >> arguments about one property named "random" to discuss another property >> named "random". This is quite an intolerable situation. >> >> I disagree that sample is the correct naming to use here. Getting a >>> sample is a verb in this context which would make it break API guidelines >>> just as well as `pick()`. To sample is to “take a sample or samples of >>> (something) for analysis.” I can agree to use `sampling()` which follows >>> API guidelines. This would result in the following grammar for `[“hi”, >>> “hello”, “hey”].sampling(2)`, “>From array, get a sampling of 2" >>> >> >> "Sampling" is fine. >> >> >> On Nov 23, 2017, 12:54 AM -0600, Xiaodi Wu , wrote: >>> >>> On Wed, Nov 22, 2017 at 23:01 Alejandro Alonso <aalonso...@outlook.com> >>> wrote: >>> >>>> Like I’ve said, python has different syntax grammar. We have to read >>>> each call site and form a sentence from it. `random.choice([1, 2, 3])` to >>>> me this reads, “Get a random choice from array”. This makes sense. Slapping >>>> the word choice as an instance property like `[1, 2, 3].choice` reads, >>>> “From array, get choice”. What is choice? This doesn’t make sense at all to >>>> me. To me, the only good solution is `[1, 2, 3].random` which reads, “From >>>> array, get random”. I actually think most users will be able to understand >>>> this at first glance rather than choice (or any or some). >>>> >>> >>> Again, my concern here is that you are proposing to name multiple things >>> "random". If this property should be called "random"--which I'm fine >>> with--then the static method "random(in:)" should be named something else, >>> and the static property "random" should be dropped altogether (as I >>> advocate for reasons we just discussed) or renamed as well. It is simply >>> too confusing that there are so many different "random" methods or >>> properties. Meanwhile, isn't your default RNG also going to be called >>> something like "DefaultRandom"? >>> >>> In regards to the sample() function on collections, I have added this as >>>> I do believe this is something users need. The name I gave it was pick() as >>>> this reads, “From array, pick 2”. >>>> >>> >>> The name "sample" has been used to good effect in other languages, has a >>> well understood meaning in statistics, and is consistent with Swift >>> language guidelines. The operation here is a sampling, and per Swift >>> guidelines the name must be a noun: therefore, 'sample' is fitting. "Pick" >>> does not intrinsically suggest randomness, whereas sample does, and your >>> proposed reading uses it as a verb, whereas Swift guidelines tell us it >>> must be a noun. I would advocate strongly for using well-established >>> terminology and sticking with "sample." >>> >>> >>> On Nov 17, 2017, 8:32 PM -0600, Xiaodi Wu via swift-evolution < >>>> swift-evolution@swift.org>, wrote: >>>> >>>> On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon < >>>> br...@architechies.com> wrote: >>>> >>>>> On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution < >>>>> swift-evolution@swift.org> wrote: >>>>> >>>>> But actually, Int.random followed by % is the much bigger issue and a >>>>> very good cautionary tale for why T.random is not a good idea. Swift >>>>> should >>>>> help users do the correct thing, and getting a random value across the >>>>> full >>>>> domain and computing an integer modulus is never the correct thing to do >>>>> because of modulo bias, yet it's a very common error to make. We are much >>>>> better off eliminating this API and encouraging use of the correct API, >>>>> thereby reducing the likelihood of users making this category of error. >>>>> >>>>> >>>>> Amen. >>>>> >>>>> If (and I agree with this) the range-based notation is less intuitive >>>>> (0..<10.random is certainly less discoverable than Int.random), then we >>>>> ought to offer an API in the form of `Int.random(in:)` but not >>>>> `Int.random`. This does not preclude a `Collection.random` API as >>>>> Alejandro >>>>> proposes, of course, and that has independent value as Gwendal says. >>>>> >>>>> >>>>> If we're not happy with the range syntax, maybe we should put >>>>> `random(in:)`-style methods on the RNG protocol as extension methods >>>>> instead. Then there's a nice, uniform style: >>>>> >>>>> let diceRoll = rng.random(in: 1...6) >>>>> let card = rng.random(in: deck) >>>>> let isHeads = rng.random(in: [true, false]) >>>>> let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint >>>>> overload >>>>> >>>>> The only issue is that this makes the default RNG's name really >>>>> important. Something like: >>>>> >>>>> DefaultRandom.shared.random(in: 1...6) >>>>> >>>>> Will be a bit of a pain for users. >>>>> >>>> >>>> I did in fact implement this style of RNG in NumericAnnex, but I'm not >>>> satisfied with the design myself. Not only is it a bit of an ergonomic >>>> thorn, there's also another drawback that actually has weighty >>>> implications: >>>> >>>> Users aren't conditioned to reuse RNG instances. Perhaps, it is because >>>> it can "feel" wrong that multiple random instances should come from the >>>> *same* RNG. Instead, it "feels" more right to initialize a new RNG for >>>> every random number. After all, if one RNG is random, two must be randomer! >>>> This error is seen with some frequency in other languages that adopt this >>>> design, and they sometimes resort to educating users through documentation >>>> that isn't consistently heeded. >>>> >>>> Of course, you and I both know that this is not ideal for performance. >>>> Moreover, for a number of PRNG algorithms, the first few hundred or >>>> thousand iterations can be more predictable than later iterations. (Some >>>> algorithms discard the first n iterations, but whether that's adequate >>>> depends on the quality of the seed, IIUC.) Both of these issues don't apply >>>> specifically to a default RNG type that cannot be initialized and always >>>> uses entropy from the global pool, but that's not enough to vindicate the >>>> design, IMO. By emphasizing *which* RNG instance is being used for random >>>> number generation, the design encourages non-reuse of non-default RNGs, >>>> which is precisely where this common error matters for performance (and >>>> maybe security). >>>> >>>> Maybe we call the default RNG instance `random`, and then give the >>>>> `random(in:)` methods another name, like `choose(in:)`? >>>>> >>>>> let diceRoll = random.choose(in: 1...6) >>>>> let card = random.choose(in: deck) >>>>> let isHeads = random.choose(in: [true, false]) >>>>> let probability = random.choose(in: 0.0...1.0) >>>>> let diceRoll = rng.choose(in: 1...6) >>>>> let card = rng.choose(in: deck) >>>>> let isHeads = rng.choose(in: [true, false]) >>>>> let probability = rng.choose(in: 0.0...1.0) >>>>> >>>>> This would allow us to keep the default RNG's type private and expose >>>>> it only as an existential—which means more code will treat RNGs as black >>>>> boxes, and people will extend the RNG protocol instead of the default RNG >>>>> struct—while also putting our default random number generator under the >>>>> name `random`, which is probably where people will look for such a thing. >>>>> >>>> >>>> I've said this already in my feedback, but it can get lost in the long >>>> chain of replies, so I'll repeat myself here because it's relevant to the >>>> discussion. I think one of the major difficulties of discussing the >>>> proposed design is that Alejandro has chosen to use a property called >>>> "random" to name multiple distinct functions which have distinct names in >>>> other languages. In fact, almost every method or function is being named >>>> "random." We are tripping over ourselves and muddling our thinking (or at >>>> least, I find myself doing so) because different things have the exact same >>>> name, and if I'm having this trouble after deep study of the design, I >>>> think it's a good sign that this is going to be greatly confusing to users >>>> generally. >>>> >>>> First, there's Alejandro's _static random_, which he proposes to return >>>> an instance of type T given a type T. In Python, this is named `randint(a, >>>> b)` for integers, and `random` (between 0 and 1) or `uniform(a, b)` for >>>> floating-type types. The distinct names reflect the fact that `randint` and >>>> `uniform` are mathematically quite different (one samples a *discrete* >>>> uniform distribution and the other a *continuous* uniform distribution), >>>> and I'm not aware of non-numeric types offering a similar API in Python. >>>> These distinct names accurately reflect critiques from others on this list >>>> that the proposed protocol `Randomizable` lumps together types that don't >>>> share any common semantics for their _static random_ method, and that the >>>> protocol is of questionable utility because types in general do not share >>>> sufficient semantics such that one can do interesting work in generic code >>>> with such a protocol. >>>> >>>> Then there's Alejandro's _instance random_, which he proposes to return >>>> an element of type T given a instance of a collection of type T. In Python, >>>> this is named "choice(seq)" (for one element, or else throws an error) and >>>> "sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to >>>> draw an analogy between _instance random_ and other instance properties of >>>> a Collection such as `first` and `last`. In fact, the behavior of Python's >>>> "choice" (if modified to return an Optional) and "sample", as a pair, would >>>> fit in very well next to Swift's existing pairs of `first` and `prefix(k)` >>>> and `last` and `suffix(k)`. We could trivially Swiftify the names here; for >>>> example: >>>> >>>> ``` >>>> [1, 2, 3].first >>>> [1, 2, 3].any // or `choice`, or `some`, or... >>>> [1, 2, 3].last >>>> >>>> [1, 2, 3].prefix(2) >>>> [1, 2, 3].sample(2) >>>> [1, 2, 3].suffix(2) >>>> ``` >>>> >>>> I'm going to advocate again for _not_ naming all of these distinct >>>> things "random". Even in conducting this discussion, it's so hard to keep >>>> track of what particular function a person is giving feedback about. >>>> >>>> >>>> _______________________________________________ >>>> swift-evolution mailing list >>>> swift-evolution@swift.org >>>> https://lists.swift.org/mailman/listinfo/swift-evolution >>>> >>>> >>> On Nov 17, 2017, 8:32 PM -0600, Xiaodi Wu via swift-evolution < >>> swift-evolution@swift.org>, wrote: >>> >>> On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon < >>> br...@architechies.com> wrote: >>> >>>> On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution < >>>> swift-evolution@swift.org> wrote: >>>> >>>> But actually, Int.random followed by % is the much bigger issue and a >>>> very good cautionary tale for why T.random is not a good idea. Swift should >>>> help users do the correct thing, and getting a random value across the full >>>> domain and computing an integer modulus is never the correct thing to do >>>> because of modulo bias, yet it's a very common error to make. We are much >>>> better off eliminating this API and encouraging use of the correct API, >>>> thereby reducing the likelihood of users making this category of error. >>>> >>>> >>>> Amen. >>>> >>>> If (and I agree with this) the range-based notation is less intuitive >>>> (0..<10.random is certainly less discoverable than Int.random), then we >>>> ought to offer an API in the form of `Int.random(in:)` but not >>>> `Int.random`. This does not preclude a `Collection.random` API as Alejandro >>>> proposes, of course, and that has independent value as Gwendal says. >>>> >>>> >>>> If we're not happy with the range syntax, maybe we should put >>>> `random(in:)`-style methods on the RNG protocol as extension methods >>>> instead. Then there's a nice, uniform style: >>>> >>>> let diceRoll = rng.random(in: 1...6) >>>> let card = rng.random(in: deck) >>>> let isHeads = rng.random(in: [true, false]) >>>> let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint >>>> overload >>>> >>>> The only issue is that this makes the default RNG's name really >>>> important. Something like: >>>> >>>> DefaultRandom.shared.random(in: 1...6) >>>> >>>> Will be a bit of a pain for users. >>>> >>> >>> I did in fact implement this style of RNG in NumericAnnex, but I'm not >>> satisfied with the design myself. Not only is it a bit of an ergonomic >>> thorn, there's also another drawback that actually has weighty implications: >>> >>> Users aren't conditioned to reuse RNG instances. Perhaps, it is because >>> it can "feel" wrong that multiple random instances should come from the >>> *same* RNG. Instead, it "feels" more right to initialize a new RNG for >>> every random number. After all, if one RNG is random, two must be randomer! >>> This error is seen with some frequency in other languages that adopt this >>> design, and they sometimes resort to educating users through documentation >>> that isn't consistently heeded. >>> >>> Of course, you and I both know that this is not ideal for performance. >>> Moreover, for a number of PRNG algorithms, the first few hundred or >>> thousand iterations can be more predictable than later iterations. (Some >>> algorithms discard the first n iterations, but whether that's adequate >>> depends on the quality of the seed, IIUC.) Both of these issues don't apply >>> specifically to a default RNG type that cannot be initialized and always >>> uses entropy from the global pool, but that's not enough to vindicate the >>> design, IMO. By emphasizing *which* RNG instance is being used for random >>> number generation, the design encourages non-reuse of non-default RNGs, >>> which is precisely where this common error matters for performance (and >>> maybe security). >>> >>> Maybe we call the default RNG instance `random`, and then give the >>>> `random(in:)` methods another name, like `choose(in:)`? >>>> >>>> let diceRoll = random.choose(in: 1...6) >>>> let card = random.choose(in: deck) >>>> let isHeads = random.choose(in: [true, false]) >>>> let probability = random.choose(in: 0.0...1.0) >>>> let diceRoll = rng.choose(in: 1...6) >>>> let card = rng.choose(in: deck) >>>> let isHeads = rng.choose(in: [true, false]) >>>> let probability = rng.choose(in: 0.0...1.0) >>>> >>>> This would allow us to keep the default RNG's type private and expose >>>> it only as an existential—which means more code will treat RNGs as black >>>> boxes, and people will extend the RNG protocol instead of the default RNG >>>> struct—while also putting our default random number generator under the >>>> name `random`, which is probably where people will look for such a thing. >>>> >>> >>> I've said this already in my feedback, but it can get lost in the long >>> chain of replies, so I'll repeat myself here because it's relevant to the >>> discussion. I think one of the major difficulties of discussing the >>> proposed design is that Alejandro has chosen to use a property called >>> "random" to name multiple distinct functions which have distinct names in >>> other languages. In fact, almost every method or function is being named >>> "random." We are tripping over ourselves and muddling our thinking (or at >>> least, I find myself doing so) because different things have the exact same >>> name, and if I'm having this trouble after deep study of the design, I >>> think it's a good sign that this is going to be greatly confusing to users >>> generally. >>> >>> First, there's Alejandro's _static random_, which he proposes to return >>> an instance of type T given a type T. In Python, this is named `randint(a, >>> b)` for integers, and `random` (between 0 and 1) or `uniform(a, b)` for >>> floating-type types. The distinct names reflect the fact that `randint` and >>> `uniform` are mathematically quite different (one samples a *discrete* >>> uniform distribution and the other a *continuous* uniform distribution), >>> and I'm not aware of non-numeric types offering a similar API in Python. >>> These distinct names accurately reflect critiques from others on this list >>> that the proposed protocol `Randomizable` lumps together types that don't >>> share any common semantics for their _static random_ method, and that the >>> protocol is of questionable utility because types in general do not share >>> sufficient semantics such that one can do interesting work in generic code >>> with such a protocol. >>> >>> Then there's Alejandro's _instance random_, which he proposes to return >>> an element of type T given a instance of a collection of type T. In Python, >>> this is named "choice(seq)" (for one element, or else throws an error) and >>> "sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to >>> draw an analogy between _instance random_ and other instance properties of >>> a Collection such as `first` and `last`. In fact, the behavior of Python's >>> "choice" (if modified to return an Optional) and "sample", as a pair, would >>> fit in very well next to Swift's existing pairs of `first` and `prefix(k)` >>> and `last` and `suffix(k)`. We could trivially Swiftify the names here; for >>> example: >>> >>> ``` >>> [1, 2, 3].first >>> [1, 2, 3].any // or `choice`, or `some`, or... >>> [1, 2, 3].last >>> >>> [1, 2, 3].prefix(2) >>> [1, 2, 3].sample(2) >>> [1, 2, 3].suffix(2) >>> ``` >>> >>> I'm going to advocate again for _not_ naming all of these distinct >>> things "random". Even in conducting this discussion, it's so hard to keep >>> track of what particular function a person is giving feedback about. >>> >>> >>> >>> _______________________________________________ >>> swift-evolution mailing list >>> swift-evolution@swift.org >>> https://lists.swift.org/mailman/listinfo/swift-evolution >>> >>> >> >> > >
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution