Re: [swift-evolution] [Draft] Rename Sequence.elementsEqual

Thorsten Seitz via swift-evolution Sun, 15 Oct 2017 12:14:57 -0700

> Am 15.10.2017 um 10:38 schrieb Xiaodi Wu via swift-evolution 
> <swift-evolution@swift.org>:
> 
> On Sun, Oct 15, 2017 at 2:29 AM, Kevin Nattinger <sw...@nattinger.net 
> <mailto:sw...@nattinger.net>> wrote:
> 
>> On Oct 14, 2017, at 7:54 PM, Xiaodi Wu <xiaodi...@gmail.com 
>> <mailto:xiaodi...@gmail.com>> wrote:
>> 
>> On Sat, Oct 14, 2017 at 6:17 PM, Kevin Nattinger <sw...@nattinger.net 
>> <mailto:sw...@nattinger.net>> wrote:
>>>> […]
>>>> * A Swift `Sequence` is, to put it simplistically, a thing that can be 
>>>> iterated over in a `for...in` loop. If it would make you happy, for the 
>>>> rest of the discussion, let's suppose we called the protocol `ForLoopable` 
>>>> instead of `Sequence`.
>>> 
>>> ForLoopable is so ugly. Since we’re just iterating over the elements, how 
>>> about, oh, say, `Iterable`? Hey, that looks familiar.
>>> 
>>> I'm not trying to bikeshed the name of `Sequence`. I'm picking an 
>>> intentionally unwieldy name for the purposes of discussing the semantics of 
>>> this particular protocol. The point is that the underlying issue has 
>>> nothing to do with the name; it can be `Iterable` or it can be `SpongeBob` 
>>> for all I care.
>> 
>> I’m not trying to bikeshed the name either, The underlying issue is that 
>> (what is currently) Sequence actually encompasses two separate 
>> functionalities, and those functionalities need to be separated with their 
>> separate semantic requirements documented. “Sequence: Iterable,” 
>> “OrderedSequence: Sequence,” “SpongeBob: ForLoopable,” the names are 100% 
>> irrelevant at this point; what’s important is that one is not necessarily 
>> ordered and the other guarantees an order.
>>  
>> 
>> What are the "two separate functionalities”?
> 
> Iteration, with convenience methods that don’t imply or rely on an order that 
> may not be there; and convenience methods applicable to sequences that do 
> have an intrinsic order.
> 
> Sets, as a mathematical concept, have no intrinsic order. However, instances 
> of `Set`, which can be iterated over, *do* have at least one order which can 
> be said to be intrinsic in the following sense: as long as iteration is 
> possible, no API design can prevent that order from being observed and 
> associated with the instance. Put another way, if you can use an instance of 
> a type in a for...in loop, intrinsic to that functionality is a publicly 
> visible order.


I disagree. Sets are value types, therefore two instances of `Set` are equal if 
they contain the same elements. An intrinsic order should therefore only depend 
on the elements contained and should be the same for two instances of `Set` 
which are equal.
This is not the case, though, as you can easily check in a playground by 
looking at Set([1,2,3,4,5,6]) and Set([6,5,4,3,2,1]) which represent the same 
value and are equal but do *not* have the same order.


> 
> 
>> All the extension methods on Sequence are ways of spelling things that you 
>> can write in a few lines of code using a `for...in` loop; they're in the 
>> stdlib to allow a more functional style which some prefer. If you accept 
>> that a type should support iteration with a `for...in` loop, then what is 
>> your basis for claiming that these extension methods are "separate 
>> functionalities”?
> 
> Just because you *can* express something in code doesn’t mean you should, or 
> that it’s correct. It is objectively false to say a Set has a first or last 
> object, because the objects therein have no order. You can take a random 
> object from the set and call it “first”, but that doesn’t make that a correct 
> definition of Set.first. A Set has no order, a specific iteration has an 
> “order” only in the sense that all and only the objects in the set have to 
> come out one at a time, but that doesn’t mean the Set itself has an order, 
> specifically a first or last object.
> 
> Since Set conforms to Collection, it is guaranteed that if one element of an 
> instance of Set comes out first one time, it'll come out first every time 
> from that instance. If it helps, think of Swift's Set as modeling 
> (imperfectly, as all models must) both a mathematical set and a multi-pass 
> sequence, just as Swift's Int models both an integer and a sequence of bits.
> 
> You’re a fan of the principal of least surprise. Tell me, which would be less 
> surprising: Set.dropFirst() actually drops a random element, or Set doesn’t 
> have a dropFirst()? And if you think dropFirst() removing an element at 
> random is not surprising, please explain why.
> 
> I think Set.dropFirst removing the first element that I observe on iteration 
> is the least surprising answer, because Swift tells me that the stdlib Set 
> models a set but that it is also a sequence.

The latter is exactly the problem Kevin did point out. A Set is an Iterable (in 
the sense that I can iterate over its elements with the order being a 
meaningless random side effect) but it is *not* a Sequence (in the sense that 
the order conveys any meaning).

-Thorsten


> 
>>>> […]
>>> 
>>>> * If a type `T` conforms to `ForLoopable` and an instance `t` of that type 
>>>> has at least one element, then *something* has to be the first element in 
>>>> a `for element in t { ... }` loop. Put another way, every instance of a 
>>>> type that conforms to `ForLoopable` must have at least one publicly 
>>>> observable order (although, intriguingly, I'm not sure it has to be a 
>>>> repeatable one). It is possible, therefore, to have a semantic answer to 
>>>> the question of which element is `first` or (if finite) `last`; one can 
>>>> also `drop(while:)`, etc., and perform lexicographical comparisons.
>>> 
>>> As a side effect of Swift being a procedural language each iteration 
>>> happens to occur in some order, yes, but that order is meaningless and 
>>> reflects nothing about the Set itself.  In fact, I’d say that `first`, 
>>> `last`, etc. are not even defined on the original Set per se, only on the 
>>> specific order that a particular iteration resulted in. And that order is 
>>> not necessarily predictable, nor necessarily stable, as you yourself said.
>>> 
>>> Consider an Iterable that gives a different order every time it’s iterated. 
>>> Should calling `.first` or `last` give a different object every time?  
>>> That’s absurd.
>>> Should an object lexicographically compare not equal to itself? Even more 
>>> absurd. 
>>> 
>>> What's your basis for saying that such behavior is absurd? It is explicitly 
>>> permitted for instances of types conforming to `SpongeBob` to be 
>>> single-pass and/or infinite. For a single-pass `SpongeBob`, `first` will 
>>> certainly return a different value every time it is invoked.
>> 
>> Is `first` mutating? No. Should it be? No! `first` and `last` are a peek at 
>> the state of the object.
>> 
>> You're right, `first` should not be mutating; that's actually an important 
>> design consideration, as Ole pointed out, and it's not actually available on 
>> `Sequence` for that reason. However, `first { _ in true }` is available and 
>> is potentially mutating, as are all methods on Sequence by design.
>> 
>> Is `elementsEqual` (or *shudder* lexicographicallyEqual) reflexive? IMO it 
>> clearly should be. Especially with the “lexicographically” part—from 
>> everything I can find, a lexicographical ordering is by definition 
>> reflexive. Do you have a citation for the idea that lexicographical equality 
>> can legitimately be non-reflexive?
>> 
>> Clearly (tautologically?), such a function should be reflexive for any 
>> argument ordered with respect to itself. However, if there is no 
>> lexicographical comparison possible, then a thing cannot compare 
>> lexicographically equal to anything, even itself.
> 
> And that’s PRECISELY why lexicographicallyEqual does not make sense to apply 
> to unordered sets. There is no lexicographical comparison possible, so why do 
> you keep insisting they should have a method that falsely claims to 
> lexicographically compare them?
> 
> I agree! It doesn't make sense if no comparison is possible! But Swift tells 
> me that a `Set` is a `Sequence`!
>> 
>>> A random number generator fulfills all the semantic requirements of 
>>> conforming to `SpongeBob`, and in fact I do just that in NumericAnnex 
>>> <https://github.com/xwu/NumericAnnex/blob/master/Sources/PRNG.swift#L53>. 
>>> `first` gives a different value every time, and a randomly generated 
>>> `SpongeBob` would unsurprisingly compare lexicographically not equal to 
>>> itself.
>> 
>> > IMO that’s a bug in the implementation; lexicographical equality is 
>> > reflexive, period.
>> 
>> > Presumably the `elementsEqual` method contains something along these lines 
>> > (take from SequenceAlgorithms.swift.gyb):
>> 
>>     var iter1 = self.makeIterator()
>>     var iter2 = other.makeIterator()
>> 
>> > By creating two iterators, you’re mutating while iterating. Turns out 
>> > there’s even a warning against this in Sequence.swift:
>> 
>> /// Using Multiple Iterators
>> /// ========================
>> ///
>> /// Whenever you use multiple iterators (or `for`-`in` loops) over a single
>> /// sequence, be sure you know that the specific sequence supports repeated
>> /// iteration, either because you know its concrete type or because the
>> /// sequence is also constrained to the `Collection` protocol.
>> ///
>> /// Obtain each separate iterator from separate calls to the sequence's
>> /// `makeIterator()` method rather than by copying. Copying an iterator is
>> /// safe, but advancing one copy of an iterator by calling its `next()` 
>> method
>> /// may invalidate other copies of that iterator. `for`-`in` loops are safe 
>> in
>> /// this regard.
>> 
>> > The default implementation of elementsEqual is therefore unsafe because it 
>> > has the potential for using an invalidated iterator.
>> 
>> You are misunderstanding the warning in the second paragraph here. The 
>> implementation (not a default implementation, unless I'm mistaken, as it 
>> cannot be overridden)
> 
>> makes each iterator using separate calls to `makeIterator()`, just as the 
>> documentation tells you to do. Calling next() on one iterator does not 
>> invalidate the other iterator, because the second is not a copy of the first.
> 
> Indeed, I misread that comment. 
> That said, is there a well-defined behavior when iterating a one-shot 
> sequence with two iterators?
> (It *is* a default implementation, btw)
> 
> Not sure about that; I don't see a protocol requirement, in which case it can 
> only be shadowed in a concrete type but it can't be overridden. How to 
> accommodate single-pass sequences is an interesting question. Off the top of 
> my head, an iterator would have to be or wrap a reference type.
>> 
>> You are, however, right that calling `rng.elementsEqual(rng)` is not 
>> advised. It isn't unsafe; it's just not useful. That said, calling 
>> `array.elementsEqual(array)` is equally safe and equally useless, and the 
>> uselessness of such a reflexive comparison is neither here nor there.
> 
> Funny how you complain about my code being useless and yet you insist below, 
> "If it's not providing you with utility, then don't use it.”  
> Regardless, you’re wrong to dismiss this case. `foo.elementsEqual(foo)` on 
> its own makes little sense, sure, but you could easily find yourself in a 
> method comparing two iterators you obtained from elsewhere, and occasionally 
> they happen to be the identical object. Allowing an iterator to return 
> `false` for .elementsEqual on itself is unexpected and dangerous.
> 
> You will always have to account for this possibility, because Swift's 
> `Equatable` explicitly allows "special values" to be not equal to themselves. 
> This is, at least in part, in order to accommodate the IEEE decree that NaN 
> != NaN:
> 
> ```
> let x = [Double.nan]
> x.elementsEqual(x) // false
> ```
> 
> Changing this behavior is way beyond the scope of this thread (and has been 
> the topic of hours (actually, weeks and weeks) of fun on this list 
> previously).
> 
>>> On the other hand, if I have a collection of objects that I want iterated 
>>> in a particular order, I can use a container that iterates in a specific, 
>>> known, well-defined way, and use that to construct the sequence of objects. 
>>>  That’s clearly an Iterable collection, but the guarantee is stronger than 
>>> that. Since it iterates objects in a specific sequence, the logical way to 
>>> express that would be `Sequence: Iterable`. Again, we’ve seen that before.  
>>> 
>>> Now, since a Sequence is guaranteed to iterate the same every time, 
>>> suddenly our `first`, `last`, `drop*`, etc. methods have a meaning inherent 
>>> to the collection itself, rather than a specific iteration.
>>> 
>>> What you call a "Sequence" here would have to be multi-pass, finite, and 
>>> ordered. 
>> 
>> > Ordered, yes, but it’s only admittedly poor wording that suggests 
>> > multi-pass, and I don’t think anything there suggests finite.
>> 
>> If a Sequence is "guaranteed to iterate the same every time," then surely it 
>> must be multi-pass; what's the alternative?
> 
> Not sure if you just missed the very next sentence or are actively ignoring 
> it just to be argumentative. I already acknowledged that that phrase didn’t 
> convey the meaning I intended, and a Sequence is not and should not be 
> required to be multi-pass.
> 
> I entirely misunderstood your next sentence as asserting that being 
> multi-pass makes the iteration order well-defined.
>>  
>> It would be better to say that the iteration order is well-defined. That 
>> will almost always mean documented, and usually predictable though obviously 
>> e.g. RNGs and iterating in random order will not be predictable by design.
> 
> Wouldn't it then suffice to document, say, that a set's iteration order is 
> the insertion order?
>> 
>>> That's actually more semantically constrained than what Swift calls a 
>>> `Collection` (which requires conforming types to be multi-pass and(?) 
>>> finite). By contrast, Swift's `SpongeBob` protocol explicitly permits 
>>> conforming single-pass, infinite, and/or unordered types. 
>> 
>> I think you’re talking about Sequence here, I’ve lost track of your nonsense 
>> by now. Yes, the current Swift protocol named Sequence allows unordered 
>> types. You seem to keep asserting that but not actually addressing my 
>> argument, which is that allowing Sequences to be unordered with the current 
>> API is undesired and actively harmful, and should therefore be changed.
>> 
>> What is harmful about it?
> 
> Well, for one, the issue you raised this thread about—two sets that are `==` 
> could return either true or false for `elementsEqual`, depending on how they 
> arrived at their current state. That’s not acceptable, and the problem isn’t 
> with the name of the method.
> 
> Apple documentation calls this one of the "order-dependent" methods. It is 
> surely acceptable for a type that conforms to an order-dependent protocol to 
> have methods that are order-dependent; they do, however, have to be clearly 
> order-dependent to avoid confusion on unordered types.
>  
> Then there are all the methods that imply a specific order of iteration. If 
> the “sequence” is unordered, who knows what you’ll get? It is incredibly easy 
> for an engineer to write a method that implicitly relies on a passed sequence 
> being intrinsically ordered and another engineer to pass it an unordered 
> “sequence.”  The first engineer could neglect to document the dependency, or 
> even not notice it; or the second engineer could just fail to read the 
> documentation thoroughly enough.  There is currently no way for the compiler 
> to enforce passing only an object that is (or at least claims to be) 
> intrinsically ordered.
> 
> It is also incredibly easy for such an engineer to use `for...in` instead to 
> accomplish the same task, generic over ordered and unordered sequences 
> whatever you name such distinguished protocols. I think your beef really 
> still boils down to Set being compatible with `for...in` at all, as Jon 
> acknowledges.
>> 
>> 
>>> As long as it is possible to iterate over a `SpongeBob`, it is meaningful 
>>> to ask what element is first obtained upon iteration or to drop the first 
>>> element obtained upon iteration.
>>> And as long as iteration is not required to be repeatable (and it isn't), 
>>> it is perfectly acceptable for these algorithms to return a different 
>>> result every time.
>> 
>> It is “meaningful” in the sense that it can technically be programmed. The 
>> actual results are meaningless beyond returning or dropping a random* 
>> element.
>> 
>> *: Don’t nitpick the word “random”, you know exactly what I mean. It’s just 
>> shorter and no less clear than “technically more-or-less deterministic but 
>> not documented, not generally predictable, and probably but not necessarily 
>> consistent from one call to the next."
>> 
>> 
>> I fail to see the issue here. If it's not providing you with utility, then 
>> don't use it.
> 
> I have no problem with functions I don’t use provided they are well-defined 
> and reasonably accurately named. Functions requiring an order on unordered 
> collections don’t pass that bar. 
> 
> As I said, you're welcome to tackle the protocol hierarchy, but I really 
> doubt it's within the realm of realistic endpoints for Swift 5. I'm just 
> trying to propose a narrowly targeted pragmatic solution to one specific 
> limited harm that might be deliverable by the next point release. As a great 
> man once said, Swift is a pragmatic language.
>> Since Collections do guarantee multi-pass iteration, Brent's example of 
>> `set.dropFirst().reduce(set.first!, ...)` provides just one instance where 
>> an unordered Collection can profitably make use of `first`. Permitting 
>> generic algorithms that can operate on either arrays or sets, for example, 
>> is the desired effect of having such a protocol; a generic algorithm that 
>> takes a Collection can ask for the first element, and in the case of an 
>> unordered Collection, an arbitrary element will do just fine.
> 
> The generic algorithms should be on a protocol that specifies everything they 
> require. If one can work on anything you can iterate over, put it on 
> Iterable. If another requires the objects to be ordered, put it on Sequence. 
> Need to express that an algorithm requires a multi-pass sequence? Make a 
> MultiPassSequence protocol and put the algorithm on an extension containing 
> that requirement. Use protocols to express requirements, as they were 
> designed for. Don’t just tack a bunch of methods onto a protocol that isn’t 
> sufficient to describe their requirements and say, “oh, by the way, only use 
> this method if your implementation meets these conditions…"
> 
> The benefits are likely outweighed by the costs of such an approach taken to 
> completion, because there are many axes to differentiate. The protocol 
> hierarchy for collections is already daunting, leading to monstrosities such 
> as `MutableRangeReplaceableRandomAccessSlice`. It stretches the bounds of 
> sensibility to have a 
> `LazyUnorderedInfiniteMultiPassMutableRangeReplaceableRandomAccessSlice`.
> 
> The Swift stdlib deliberately eschews modeling everything in protocol 
> hierarchies with the highest level of granularity. There's some fudging, 
> deliberately, to find a happy medium between obtuse and approachable, between 
> too many/too specialized and not enough. For example, I pushed for protocols 
> such as `Field` and `Ring` at the top of the numeric hierarchy, which might 
> allow complex number types to slot into the hierarchy more sensibly, for 
> example. But we have a compromise protocol `Numeric` which doesn't quite have 
> the same guarantees but is much more approachable. Notice that we also don't 
> break down numeric protocols into `Addable`, `Subtractable`, etc.; we also 
> have that fudge factor built into `Equatable`, as I mentioned.
>>> 
>>> `first` is the first object in the Sequence. It doesn’t matter how the 
>>> sequence came to be in that order; it doesn’t matter whether or not the 
>>> sequence has already been iterated or how many times. `first` is the first 
>>> object that is, was, and always will be presented by the Sequence’s 
>>> Iterator. (Until the collection is mutated, obviously).
>>> 
>>> To summarize,
>>> A Set has no intrinsic order. You can iterate over it, and a specific 
>>> iteration of a set has an order, but that order is not tied to the Set 
>>> itself beyond including all and only the items therein. Therefore, the Set 
>>> itself has no intrinsic `first`, `last`, lexicographical comparison, etc.; 
>>> only its iterations do, and they are not themselves Sets.
>>> A Sequence does have an intrinsic order. The order of iteration reflects 
>>> the order inherent to the Sequence. Therefore, a Sequence has a `first`, 
>>> `last`, lexicographical comparison, etc.
>>> 
>>> Just in case it’s not obvious, `Set` here is pretty much interchangeable 
>>> with any other unordered iterable.
>>> 
>>> What you want to call a "Sequence" is what Swift calls a `Collection`, with 
>>> additional restrictions. What you want to be called "Iterable" is what 
>>> Swift calls `Sequence` (or now, `SpongeBob`). Clearly, shuffling names will 
>>> not make these protocols support any more functionality, so that can be put 
>>> aside.
>> 
>> No, no, no! What I want to call “Iterable” is specified below. It is about 
>> HALF of what’s currently in Sequence—the half that has to do with iterating, 
>> whence the name.
>> What I want to call Sequence is precisely what Swift now calls Sequence—the 
>> methods that are in Iterable by virtue of adopting Iterable, PLUS some 
>> methods that only make sense on iterable groups of objects where the 
>> iteration order is well-defined.
>> 
>>> 
>>> Now, with that out of the way, why do you think that only `Collection` 
>>> types should have `first` and `last`? These helper properties and methods 
>>> are simply convenient ways to spell things that can be done with 
>>> `for...in`--the whole point of supplying them is to allow people to work 
>>> with these types in a more functional style.
>> 
>> Apparently “collection" was a bad choice of word. What I clearly meant was 
>> not the current Swift Collection protocol, but rather an unordered 
>> assemblage of objects. UnorderedCollection, perhaps, or if that’s still 
>> going to cause issues, try UnorderedAssemblage.  What `first` and `last` 
>> really mean in an UnorderedAssemblage is give me some object from the 
>> assembled objects, I don’t care which one. For which it’s much more clear to 
>> have an `anyObject()` as on NSSet; as another user has pointed out, 
>> `assemblage.makeIterator().next()` works just as well. (I just checked, and 
>> that’s actually how `first` is implemented. But it’s on Collection, which is 
>> guaranteed to be multipass,)
>> 
>> Again, the point of having a protocol-based design is to allow useful 
>> _generic_ algorithms to be written; that `first` and `last` would be 
>> equivalent to an arbitrary element in the case that a collection is 
>> unordered is not at all an argument against these types conforming to 
>> `Collection`; if anything, it's an argument for it.
> 
> If a protocol demands the first object, you should give it the first object. 
> If you don’t have a first object, maybe you shouldn’t conform to the 
> protocol. If the protocol really just needs any old object, call it 
> `anyObject`. 
> 
> Sure, but we *do* have a first element; it just happens to be the first that 
> is obtainable on iteration. That you could make a good case for any other 
> element to be first doesn't mean that this one isn't a perfectly cromulent 
> first.
> 
> 
>> Just as `Sequence.underestimatedCount` is equivalent to `Collection.count` 
>> for types that conform to `Collection`, or the instance 
>> `BinaryInteger.bitWidth` is equivalent to a static `bitWidth` for types that 
>> conform to `FixedWidthInteger`.
> 
> I don’t see how those are relevant, they all mean what they claim to, unlike 
> Set.first/dropFirst/etc.
> 
>>>>> public protocol Iterable {
>>>>>   associatedtype Iterator: IteratorProtocol
>>>>>   func map<T>(...) -> [T] // Iterable where .Iterator.Element == T
>>>>>   func filter(...) -> [Iterator.Element] // Iterable where 
>>>>> .Iterator.Element == Self.Iterator.Element
>>>>>   func forEach(...)
>>>>>   func makeIterator() -> Iterator
>>>>>   var underestimatedCount: Int { get }
>>>>> }
>>>>> 
>>>>> public protocol Sequence: Iterable { // Maybe OrderedSequence just to 
>>>>> make the well-defined-order requirement explicit
>>>>>   associatedtype SubSequence
>>>>>   func dropFirst(...)   -> SubSequence   // Sequence where 
>>>>> .Iterator.Element == Self.Iterator.Element
>>>>>   func dropLast(...)    -> SubSequence   //    " "
>>>>>   func drop(while...)   -> SubSequence   //    " "
>>>>>   func prefix(...)      -> SubSequence   //    " "
>>>>>   func prefix(while...) -> SubSequence   //    " "
>>>>>   func suffix(...)      -> SubSequence   //    " "
>>>>>   func split(...where...)  -> [SubSequence] // Iterable where 
>>>>> .Iterator.Element == (Sequence where .Iterator.Element == 
>>>>> Self.Iterator.Element)
>>>>> }
>>> 
>>> 
>>> And just to be explicit, 
>>> struct Set: Iterable {…}
>>> struct Dictionary: Iterable {…}
>>> struct Array: Sequence {…}
>>> etc.
>>> 
>>> Hopefully at some point:
>>> struct OrderedSet: Sequence {…}
> 
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [Draft] Rename Sequence.elementsEqual

Reply via email to