Re: [swift-evolution] Proposal: Python's indexing and slicing

Donnacha Oisín Kidney via swift-evolution Tue, 22 Dec 2015 12:58:46 -0800

I don’t think I am. Maybe I’m confused: the current suggestion is the addition 
of a $ operator (or labelled subscripts, or another operator) to signify 
“offset indexing”, yes? As in:


someCollection[$3] == someCollection[someCollection.startIndex.advancedBy(3)]
someCollection[$3..<$] == 
someCollection[someCollection.startIndex.advancedBy(3)..<someCollection.endIndex]

I’m not arguing against preserving the indexing of the base array, I understand 
its benefits. I’m arguing that, instead of using an extra indicator (like $) to 
indicate offset indexing, with the default being non-offset, why not have the 
offset indexing be the default, requiring an extra indication (like the label 
direct) for the non-offset. This would keep the benefits of non-offset 
indexing, because you’d still have access to it. 

Is think that’s part of this discussion, right? I could start another thread, 
if not.

Oisín

> On 22 Dec 2015, at 20:06, Kevin Ballard <[email protected]> wrote:
> 
> On Mon, Dec 21, 2015, at 08:28 PM, Donnacha Oisín Kidney wrote:
>> Why not make the “forgiving” version the default? I mean, the majority of 
>> python-style composable slicing would be happening on arrays and array 
>> slices, for which there’s no performance overhead, and the forgiving version 
>> would seam to suit the “safe-by-default” philosophy. I’ve seen mistakes like 
>> this:
>>  
>> let ar = [1, 2, 3, 4, 5]
>> let arSlice = ar[2..<5]
>> arSlice[1]
>>  
>> on a few occasions, for instance. I would think something like this:
>>  
>> let ar = [0, 1, 2, 3, 4, 5]
>>  
>> let arSlice = ar[2...] // [3, 4, 5]
>> arSlice[..<3] // [2, 3, 4]
>> arSlice[...3] // [2, 3, 4, 5]
>> arSlice[direct: 2] // 2
>> arSlice[0] // 2
>>  
>> Would be what was expected from most programmers learning Swift, while 
>> leaving the unforgiving option open to those who need it.
>  
> You seem to be arguing against the notion that array slices preserve the 
> indexing of the base array, but that's not what's under discussion here.
>  
> -Kevin Ballard
>  
>>> On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>  
>>>>  
>>>> On Dec 21, 2015, at 1:51 PM, Kevin Ballard <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>  
>>>> On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:
>>>>>  
>>>>>> On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution 
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>  
>>>>>> On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution 
>>>>>> wrote:
>>>>>>>  
>>>>>>> Yes, we already have facilities to do most of what Python can do here, 
>>>>>>> but one major problem IMO is that the “language” of slicing is so 
>>>>>>> non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix.  
>>>>>>> Introducing “$” for this purpose could make it all hang together and 
>>>>>>> also eliminate the “why does it have to be so hard to look at the 2nd 
>>>>>>> character of a string?!” problem.  That is, use the identifier “$” 
>>>>>>> (yes, that’s an identifier in Swift) to denote the beginning-or-end of 
>>>>>>> a collection.  Thus,
>>>>>>>  
>>>>>>>   c[c.startIndex.advancedBy(3)] =>c[$+3]        // Python: c[3]
>>>>>>>   c[c.endIndex.advancedBy(-3)] =>c[$-3]        // Python: c[-3]
>>>>>>>  
>>>>>>>   c.dropFirst(3)  =>c[$+3...]     // Python: c[3:]
>>>>>>>   c.dropLast(3) =>c[..<$-3]     // Python: c[:-3]
>>>>>>>   c.prefix(3) =>c[..<$+3]     // Python: c[:3]
>>>>>>>   c.suffix(3) => c[$-3...]     // Python: c[-3:]
>>>>>>>  
>>>>>>> It even has the nice connotation that, “this might be a little more 
>>>>>>> expen$ive than plain indexing” (which it might, for non-random-access 
>>>>>>> collections).  I think the syntax is still a bit heavy, not least 
>>>>>>> because of “..<“ and “...”, but the direction has potential. 
>>>>>>>  
>>>>>>>  I haven’t had the time to really experiment with a design like this; 
>>>>>>> the community might be able to help by prototyping and using some 
>>>>>>> alternatives.  You can do all of this outside the standard library with 
>>>>>>> extensions.
>>>>>>  
>>>>>> Interesting idea.
>>>>>>  
>>>>>> One downside is it masks potentially O(N) operations 
>>>>>> (ForwardIndex.advancedBy()) behind the + operator, which is typically 
>>>>>> assumed to be an O(1) operation.
>>>>>  
>>>>> Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too 
>>>>> much.
>>>>>  
>>>>>> Alos, the $+3 syntax suggests that it requires there to be at least 3 
>>>>>> elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take 
>>>>>> maximum counts, so they operate on sequences of fewer elements.
>>>>>  
>>>>> For indexing, $+3 would make that requirement.  For slicing, it wouldn’t. 
>>>>>  I’m not sure why you say something about thesyntaxsuggests exceeding 
>>>>> bounds would be an error.
>>>>  
>>>> Because there's no precedent for + behaving like a saturating addition, 
>>>> not in Swift and not, to my knowledge, anywhere else either. The closest 
>>>> example that comes to mind is floating-point numbers eventually ending up 
>>>> at Infinity, but that's not really saturating addition, that's just a 
>>>> consequence of Infinity + anything == Infinity. Nor do I think we should 
>>>> be establishing precedent of using + for saturating addition, because that 
>>>> would be surprising to people.
>>>  
>>> To call this “saturating addition” is an…interesting…interpretation.  I 
>>> don’t view it that way at all.  The “saturation,” if there is any, happens 
>>> as part of subscripting.  You don’t even know what the “saturation limit” 
>>> is until you couple the range expression with the collection.  
>>>  
>>> In my view, the addition is part of an EDSL that represents a notional 
>>> position offset from the start or end, then the subscript operation 
>>> forgivingly trims these offsets as needed.
>>>  
>>>> Additionally, I don't think adding a $ to an array slice expression should 
>>>> result in a behavioral difference, e.g. array[3..<array.endIndex] and 
>>>> array[$+3..<$] should behave the same
>>>  
>>> I see your point, but don’t (necessarily) agree with you there.  “$” here 
>>> is used as an indicator of several of things, including 
>>> not-necessarily-O(1) and forgiving slicing.  We could introduce a label 
>>> just to handle that:
>>>  
>>>  array[forgivingAndNotO1: $+3..<$]  
>>>  
>>> but it doesn’t look like a win to me.
>>>  
>>>>  
>>>>>> There's also some confusion with using $ for both start and end. What if 
>>>>>> I say c[$..<$]? We'd have to infer from position that the first $ is the 
>>>>>> start and the second $ is the end, but then what about c[$+n..<$+m]? We 
>>>>>> can't treat the usage of + as meaning "from start" because the argument 
>>>>>> might be negative. And if we use the overall sign of the 
>>>>>> operation/argument together, then the expression `$+n` could mean from 
>>>>>> start or from end, which comes right back to the problem with Python 
>>>>>> syntax.
>>>>>  
>>>>> There’s a problem with Python syntax?  I’m guessing you mean that c[a:b] 
>>>>> can have very different interpretations depending on whether a and b are 
>>>>> positive or negative?
>>>>  
>>>> Exactly.
>>>>  
>>>>> First of all, I should say: that doesn’t really bother me.  The 99.9% use 
>>>>> case for this operation uses literal constants for the offsets, and I 
>>>>> haven’t heard of it causing confusion for Python programmers.  That said, 
>>>>> if we wanted to address it, we could easily require n and m above to be 
>>>>> literals, rather than Ints (which incidentally guarantees it’s an O(1) 
>>>>> operation).  That has upsides and downsides of course.
>>>>  
>>>> I don't think we should add this feature in any form if it only supports 
>>>> literals.
>>>>  
>>>>>> I think Jacob's idea has some promise though:
>>>>>>  
>>>>>> c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
>>>>>> c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]
>>>>>  
>>>>>> But naming the slice operations is a little trickier. We could actually 
>>>>>> just go ahead and re-use the existing method names for those:
>>>>>>  
>>>>>> c.dropFirst(3) => c[dropFirst: 3]
>>>>>> c.dropLast(3) => c[dropLast: 3]
>>>>>> c.prefix(3) => c[prefix: 3]
>>>>>> c.suffix(3) => c[suffix: 3]
>>>>>>  
>>>>>> That's not so compelling, since we already have the methods, but I 
>>>>>> suppose it makes sense if you want to try and make all slice-producing 
>>>>>> methods use subscript syntax (which I have mixed feelings about).
>>>>>  
>>>>> Once we get efficient in-place slice mutation (via slice addressors), it 
>>>>> becomes a lot more compelling, IMO.  But I still don’t find the naming 
>>>>> terribly clear, and I don’t love that one needs to combine two subscript 
>>>>> operations in order to drop the first and last element or take just 
>>>>> elements 3..<5.
>>>>  
>>>> You can always add more overloads, such as
>>>>  
>>>> c[dropFirst: 3, dropLast: 5]
>>>>  
>>>> but I admit that there's a bunch of combinations here that would need to 
>>>> be added.
>>>>  
>>>  
>>> My point is that we have an English language soup that doesn’t compose 
>>> naturally.  Slicing in Python is much more elegant and composes well.  If 
>>> we didn’t currently have 6 separate methods (7 including subscript for 
>>> index-based slicing) for handling this, that need to be separately 
>>> documented and understood, I wouldn’t be so eager to replace the words with 
>>> an EDSL, but in this case IMO it is an overall simplification.
>>>  
>>>> My concern over trying to make it easier to take elements 3..<5 is that 
>>>> incrementing indexes is verbose for a reason, and adding a feature that 
>>>> makes it really easy to index into any collection by using integers is a 
>>>> bad idea as it will hide O(N) operations behind code that looks like O(1). 
>>>> And hiding these operations makes it really easy to accidentally turn an 
>>>> O(N) algorithm into an O(N^2) algorithm.
>>>  
>>> As I’ve said, I consider the presence of “$” to be enough of an indicator 
>>> that something co$tly is happening, though I’m open to other ways of 
>>> indicating it.  I’m trying to strike a balance between “rigorous” and “easy 
>>> to use,” here.  Remember that Swift has to work in playgrounds and for 
>>> beginning programmers, too.  I am likewise unsatisfied with the (lack of) 
>>> ease-of-use of String as well (e.g. for lexing and parsing tasks), and have 
>>> made improving it a priority for Swift 3.  I view fixing the slicing 
>>> interface as part of that job.
>>>  
>>>>> Even if we need separate symbols for “start” and “end” (e.g. using “$” 
>>>>> for both might just be too confusing for people in the end, even if it 
>>>>> works otherwise), I still think a generalized form that allows ranges to 
>>>>> be used everywhere for slicing is going to be much easier to understand 
>>>>> than this hodgepodge of words we use today.
>>>>  
>>>> I'm tempted to say that if we do this, we should use two different sigils, 
>>>> and more importantly we should not use + and - but instead use methods on 
>>>> the sigils like advancedBy(), as if the sigils were literally placeholders 
>>>> for the start/end index. That way we won't write code that looks O(1) when 
>>>> it's not. For example:
>>>>  
>>>> col[^.advancedBy(3)..<$]
>>>>  
>>>> Although we'd need to revisit the names a little, because $.advancedBy(-3) 
>>>> is a bit odd when we know that $ can't ever take a non-negative number for 
>>>> that.
>>>>  
>>>> Or maybe we should just use $ instead as a token that means "the 
>>>> collection being indexed", so you'd actually say something like
>>>>  
>>>> col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]
>>>  
>>> I really like that direction, but I don’t think it does enough to solve the 
>>> ease-of-use problem; I still think the result looks and feels horrible 
>>> compared to Python for the constituencies mentioned above.  
>>>  
>>> I briefly implemented this syntax, that was intended to suggest repeated 
>>> incrementation:
>>>  
>>> col.startIndex++3 // col.startIndex.advancedBy(3)
>>>  
>>> I don’t think that is viable, especially now that we’ve dropped “++” and 
>>> “--“. But this syntax 
>>>  
>>> col[$.start⛄️3..<$.start⛄️5]
>>>  
>>> begins to be interesting for some definition of ⛄️.
>>>  
>>>> This solves the problem of subscripting a collection without having to 
>>>> store it in a local variable, without discarding any of the intentional 
>>>> index overhead. Of course, if the goal is to make index operations more 
>>>> concise this doesn't really help much, but my argument here is that it's 
>>>> hard to cut down on the verbosity without hiding O(N) operations.
>>>  
>>> That ship has already sailed somewhat, because e.g. every Collection has to 
>>> have a count property, which can be O(N).  But I still like to uphold it 
>>> where possible.  I just don’t think the combination of “+” and “$” 
>>> necessarily has such a strong O(1) connotation… especially because the 
>>> precedent for seeing those symbols together is regexps.
>>>  
>>>>  
>>>> -Kevin Ballard
>>>>  
>>>>>> But the [fromStart:] and [fromEnd:] subscripts seem useful.
>>>>> Yeah… I really want a unified solution that covers slicing as well as 
>>>>> offset indexing.
>>>>>  
>>>>> -Dave
>>>>>  
>>>>  
>>> 
>>>  
>>> -Dave
>>>  
>>>  
>>>  
>>> _______________________________________________
>>> swift-evolution mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution 
>>> <https://lists.swift.org/mailman/listinfo/swift-evolution>
>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Proposal: Python's indexing and slicing

Reply via email to