Re: [swift-evolution] [Proposal] Random Unification

Alejandro Alonso via swift-evolution Fri, 24 Nov 2017 12:56:33 -0800

Regarding naming too many things “random”, I’ve talked to many developers on my 
end and they all don’t find it confusing. This proposal is aimed to make it 
obvious what the operation is doing when regarding random. I still agree that 
the proposed solution does just that and in practice feels good to write.


I disagree that sample is the correct naming to use here. Getting a sample is a 
verb in this context which would make it break API guidelines just as well as 
`pick()`. To sample is to “take a sample or samples of (something) for 
analysis.” I can agree to use `sampling()` which follows API guidelines. This 
would result in the following grammar for `[“hi”, “hello”, “hey”].sampling(2)`, 
“From array, get a sampling of 2"

- Alejandro


On Nov 23, 2017, 12:54 AM -0600, Xiaodi Wu , wrote:
On Wed, Nov 22, 2017 at 23:01 Alejandro Alonso 
<[email protected]<mailto:[email protected]>> wrote:
Like I’ve said, python has different syntax grammar. We have to read each call 
site and form a sentence from it. `random.choice([1, 2, 3])` to me this reads, 
“Get a random choice from array”. This makes sense. Slapping the word choice as 
an instance property like `[1, 2, 3].choice` reads, “From array, get choice”. 
What is choice? This doesn’t make sense at all to me. To me, the only good 
solution is `[1, 2, 3].random` which reads, “From array, get random”. I 
actually think most users will be able to understand this at first glance 
rather than choice (or any or some).

Again, my concern here is that you are proposing to name multiple things 
"random". If this property should be called "random"--which I'm fine with--then 
the static method "random(in:)" should be named something else, and the static 
property "random" should be dropped altogether (as I advocate for reasons we 
just discussed) or renamed as well. It is simply too confusing that there are 
so many different "random" methods or properties. Meanwhile, isn't your default 
RNG also going to be called something like "DefaultRandom"?

In regards to the sample() function on collections, I have added this as I do 
believe this is something users need. The name I gave it was pick() as this 
reads, “From array, pick 2”.

The name "sample" has been used to good effect in other languages, has a well 
understood meaning in statistics, and is consistent with Swift language 
guidelines. The operation here is a sampling, and per Swift guidelines the name 
must be a noun: therefore, 'sample' is fitting. "Pick" does not intrinsically 
suggest randomness, whereas sample does, and your proposed reading uses it as a 
verb, whereas Swift guidelines tell us it must be a noun. I would advocate 
strongly for using well-established terminology and sticking with "sample."


On Nov 17, 2017, 8:32 PM -0600, Xiaodi Wu via swift-evolution 
<[email protected]<mailto:[email protected]>>, wrote:
On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon 
<[email protected]<mailto:[email protected]>> wrote:
On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution 
<[email protected]<mailto:[email protected]>> wrote:

But actually, Int.random followed by % is the much bigger issue and a very good 
cautionary tale for why T.random is not a good idea. Swift should help users do 
the correct thing, and getting a random value across the full domain and 
computing an integer modulus is never the correct thing to do because of modulo 
bias, yet it's a very common error to make. We are much better off eliminating 
this API and encouraging use of the correct API, thereby reducing the 
likelihood of users making this category of error.

Amen.

If (and I agree with this) the range-based notation is less intuitive 
(0..<10.random is certainly less discoverable than Int.random), then we ought 
to offer an API in the form of `Int.random(in:)` but not `Int.random`. This 
does not preclude a `Collection.random` API as Alejandro proposes, of course, 
and that has independent value as Gwendal says.

If we're not happy with the range syntax, maybe we should put 
`random(in:)`-style methods on the RNG protocol as extension methods instead. 
Then there's a nice, uniform style:

let diceRoll = rng.random(in: 1...6)
let card = rng.random(in: deck)
let isHeads = rng.random(in: [true, false])
let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint overload

The only issue is that this makes the default RNG's name really important. 
Something like:

DefaultRandom.shared.random(in: 1...6)

Will be a bit of a pain for users.

I did in fact implement this style of RNG in NumericAnnex, but I'm not 
satisfied with the design myself. Not only is it a bit of an ergonomic thorn, 
there's also another drawback that actually has weighty implications:

Users aren't conditioned to reuse RNG instances. Perhaps, it is because it can 
"feel" wrong that multiple random instances should come from the *same* RNG. 
Instead, it "feels" more right to initialize a new RNG for every random number. 
After all, if one RNG is random, two must be randomer! This error is seen with 
some frequency in other languages that adopt this design, and they sometimes 
resort to educating users through documentation that isn't consistently heeded.

Of course, you and I both know that this is not ideal for performance. 
Moreover, for a number of PRNG algorithms, the first few hundred or thousand 
iterations can be more predictable than later iterations. (Some algorithms 
discard the first n iterations, but whether that's adequate depends on the 
quality of the seed, IIUC.) Both of these issues don't apply specifically to a 
default RNG type that cannot be initialized and always uses entropy from the 
global pool, but that's not enough to vindicate the design, IMO. By emphasizing 
*which* RNG instance is being used for random number generation, the design 
encourages non-reuse of non-default RNGs, which is precisely where this common 
error matters for performance (and maybe security).

Maybe we call the default RNG instance `random`, and then give the 
`random(in:)` methods another name, like `choose(in:)`?

let diceRoll = random.choose(in: 1...6)
let card = random.choose(in: deck)
let isHeads = random.choose(in: [true, false])
let probability = random.choose(in: 0.0...1.0)
let diceRoll = rng.choose(in: 1...6)
let card = rng.choose(in: deck)
let isHeads = rng.choose(in: [true, false])
let probability = rng.choose(in: 0.0...1.0)

This would allow us to keep the default RNG's type private and expose it only 
as an existential—which means more code will treat RNGs as black boxes, and 
people will extend the RNG protocol instead of the default RNG struct—while 
also putting our default random number generator under the name `random`, which 
is probably where people will look for such a thing.

I've said this already in my feedback, but it can get lost in the long chain of 
replies, so I'll repeat myself here because it's relevant to the discussion. I 
think one of the major difficulties of discussing the proposed design is that 
Alejandro has chosen to use a property called "random" to name multiple 
distinct functions which have distinct names in other languages. In fact, 
almost every method or function is being named "random." We are tripping over 
ourselves and muddling our thinking (or at least, I find myself doing so) 
because different things have the exact same name, and if I'm having this 
trouble after deep study of the design, I think it's a good sign that this is 
going to be greatly confusing to users generally.

First, there's Alejandro's _static random_, which he proposes to return an 
instance of type T given a type T. In Python, this is named `randint(a, b)` for 
integers, and `random` (between 0 and 1) or `uniform(a, b)` for floating-type 
types. The distinct names reflect the fact that `randint` and `uniform` are 
mathematically quite different (one samples a *discrete* uniform distribution 
and the other a *continuous* uniform distribution), and I'm not aware of 
non-numeric types offering a similar API in Python. These distinct names 
accurately reflect critiques from others on this list that the proposed 
protocol `Randomizable` lumps together types that don't share any common 
semantics for their _static random_ method, and that the protocol is of 
questionable utility because types in general do not share sufficient semantics 
such that one can do interesting work in generic code with such a protocol.

Then there's Alejandro's _instance random_, which he proposes to return an 
element of type T given a instance of a collection of type T. In Python, this 
is named "choice(seq)" (for one element, or else throws an error) and 
"sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to 
draw an analogy between _instance random_ and other instance properties of a 
Collection such as `first` and `last`. In fact, the behavior of Python's 
"choice" (if modified to return an Optional) and "sample", as a pair, would fit 
in very well next to Swift's existing pairs of `first` and `prefix(k)` and 
`last` and `suffix(k)`. We could trivially Swiftify the names here; for example:

```
[1, 2, 3].first
[1, 2, 3].any // or `choice`, or `some`, or...
[1, 2, 3].last

[1, 2, 3].prefix(2)
[1, 2, 3].sample(2)
[1, 2, 3].suffix(2)
```

I'm going to advocate again for _not_ naming all of these distinct things 
"random". Even in conducting this discussion, it's so hard to keep track of 
what particular function a person is giving feedback about.


_______________________________________________
swift-evolution mailing list
[email protected]<mailto:[email protected]>
https://lists.swift.org/mailman/listinfo/swift-evolution

On Nov 17, 2017, 8:32 PM -0600, Xiaodi Wu via swift-evolution 
<[email protected]<mailto:[email protected]>>, wrote:

On Fri, Nov 17, 2017 at 7:11 PM, Brent Royal-Gordon 
<[email protected]<mailto:[email protected]>> wrote:
On Nov 17, 2017, at 3:09 PM, Xiaodi Wu via swift-evolution 
<[email protected]<mailto:[email protected]>> wrote:

But actually, Int.random followed by % is the much bigger issue and a very good 
cautionary tale for why T.random is not a good idea. Swift should help users do 
the correct thing, and getting a random value across the full domain and 
computing an integer modulus is never the correct thing to do because of modulo 
bias, yet it's a very common error to make. We are much better off eliminating 
this API and encouraging use of the correct API, thereby reducing the 
likelihood of users making this category of error.

Amen.

If (and I agree with this) the range-based notation is less intuitive 
(0..<10.random is certainly less discoverable than Int.random), then we ought 
to offer an API in the form of `Int.random(in:)` but not `Int.random`. This 
does not preclude a `Collection.random` API as Alejandro proposes, of course, 
and that has independent value as Gwendal says.

If we're not happy with the range syntax, maybe we should put 
`random(in:)`-style methods on the RNG protocol as extension methods instead. 
Then there's a nice, uniform style:

let diceRoll = rng.random(in: 1...6)
let card = rng.random(in: deck)
let isHeads = rng.random(in: [true, false])
let probability = rng.random(in: 0.0...1.0) // Special FloatingPoint overload

The only issue is that this makes the default RNG's name really important. 
Something like:

DefaultRandom.shared.random(in: 1...6)

Will be a bit of a pain for users.

I did in fact implement this style of RNG in NumericAnnex, but I'm not 
satisfied with the design myself. Not only is it a bit of an ergonomic thorn, 
there's also another drawback that actually has weighty implications:

Users aren't conditioned to reuse RNG instances. Perhaps, it is because it can 
"feel" wrong that multiple random instances should come from the *same* RNG. 
Instead, it "feels" more right to initialize a new RNG for every random number. 
After all, if one RNG is random, two must be randomer! This error is seen with 
some frequency in other languages that adopt this design, and they sometimes 
resort to educating users through documentation that isn't consistently heeded.

Of course, you and I both know that this is not ideal for performance. 
Moreover, for a number of PRNG algorithms, the first few hundred or thousand 
iterations can be more predictable than later iterations. (Some algorithms 
discard the first n iterations, but whether that's adequate depends on the 
quality of the seed, IIUC.) Both of these issues don't apply specifically to a 
default RNG type that cannot be initialized and always uses entropy from the 
global pool, but that's not enough to vindicate the design, IMO. By emphasizing 
*which* RNG instance is being used for random number generation, the design 
encourages non-reuse of non-default RNGs, which is precisely where this common 
error matters for performance (and maybe security).

Maybe we call the default RNG instance `random`, and then give the 
`random(in:)` methods another name, like `choose(in:)`?

let diceRoll = random.choose(in: 1...6)
let card = random.choose(in: deck)
let isHeads = random.choose(in: [true, false])
let probability = random.choose(in: 0.0...1.0)
let diceRoll = rng.choose(in: 1...6)
let card = rng.choose(in: deck)
let isHeads = rng.choose(in: [true, false])
let probability = rng.choose(in: 0.0...1.0)

This would allow us to keep the default RNG's type private and expose it only 
as an existential—which means more code will treat RNGs as black boxes, and 
people will extend the RNG protocol instead of the default RNG struct—while 
also putting our default random number generator under the name `random`, which 
is probably where people will look for such a thing.

I've said this already in my feedback, but it can get lost in the long chain of 
replies, so I'll repeat myself here because it's relevant to the discussion. I 
think one of the major difficulties of discussing the proposed design is that 
Alejandro has chosen to use a property called "random" to name multiple 
distinct functions which have distinct names in other languages. In fact, 
almost every method or function is being named "random." We are tripping over 
ourselves and muddling our thinking (or at least, I find myself doing so) 
because different things have the exact same name, and if I'm having this 
trouble after deep study of the design, I think it's a good sign that this is 
going to be greatly confusing to users generally.

First, there's Alejandro's _static random_, which he proposes to return an 
instance of type T given a type T. In Python, this is named `randint(a, b)` for 
integers, and `random` (between 0 and 1) or `uniform(a, b)` for floating-type 
types. The distinct names reflect the fact that `randint` and `uniform` are 
mathematically quite different (one samples a *discrete* uniform distribution 
and the other a *continuous* uniform distribution), and I'm not aware of 
non-numeric types offering a similar API in Python. These distinct names 
accurately reflect critiques from others on this list that the proposed 
protocol `Randomizable` lumps together types that don't share any common 
semantics for their _static random_ method, and that the protocol is of 
questionable utility because types in general do not share sufficient semantics 
such that one can do interesting work in generic code with such a protocol.

Then there's Alejandro's _instance random_, which he proposes to return an 
element of type T given a instance of a collection of type T. In Python, this 
is named "choice(seq)" (for one element, or else throws an error) and 
"sample(seq, k)" (for up to k elements). As I noted, Alejandro was right to 
draw an analogy between _instance random_ and other instance properties of a 
Collection such as `first` and `last`. In fact, the behavior of Python's 
"choice" (if modified to return an Optional) and "sample", as a pair, would fit 
in very well next to Swift's existing pairs of `first` and `prefix(k)` and 
`last` and `suffix(k)`. We could trivially Swiftify the names here; for example:

```
[1, 2, 3].first
[1, 2, 3].any // or `choice`, or `some`, or...
[1, 2, 3].last

[1, 2, 3].prefix(2)
[1, 2, 3].sample(2)
[1, 2, 3].suffix(2)
```

I'm going to advocate again for _not_ naming all of these distinct things 
"random". Even in conducting this discussion, it's so hard to keep track of 
what particular function a person is giving feedback about.



_______________________________________________
swift-evolution mailing list
[email protected]<mailto:[email protected]>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Proposal] Random Unification

Reply via email to