Re: [swift-evolution] Proposal: Python's indexing and slicing

Donnacha Oisín Kidney via swift-evolution Mon, 21 Dec 2015 20:29:36 -0800

Why not make the “forgiving” version the default? I mean, the majority of 
python-style composable slicing would be happening on arrays and array slices, 
for which there’s no performance overhead, and the forgiving version would seam 
to suit the “safe-by-default” philosophy. I’ve seen mistakes like this:


let ar = [1, 2, 3, 4, 5]
let arSlice = ar[2..<5]
arSlice[1]

on a few occasions, for instance. I would think something like this:

let ar = [0, 1, 2, 3, 4, 5]

let arSlice = ar[2...] // [3, 4, 5]
arSlice[..<3] // [2, 3, 4]
arSlice[...3] // [2, 3, 4, 5]
arSlice[direct: 2] // 2
arSlice[0] // 2

Would be what was expected from most programmers learning Swift, while leaving 
the unforgiving option open to those who need it.

> On 22 Dec 2015, at 03:29, Dave Abrahams via swift-evolution 
> <[email protected]> wrote:
> 
>> 
>> On Dec 21, 2015, at 1:51 PM, Kevin Ballard <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> On Mon, Dec 21, 2015, at 11:56 AM, Dave Abrahams wrote:
>>>  
>>>> On Dec 19, 2015, at 8:52 PM, Kevin Ballard via swift-evolution 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>  
>>>> On Fri, Dec 18, 2015, at 02:39 PM, Dave Abrahams via swift-evolution wrote:
>>>>>  
>>>>> Yes, we already have facilities to do most of what Python can do here, 
>>>>> but one major problem IMO is that the “language” of slicing is so 
>>>>> non-uniform: we have [a..<b], dropFirst, dropLast, prefix, and suffix.  
>>>>> Introducing “$” for this purpose could make it all hang together and also 
>>>>> eliminate the “why does it have to be so hard to look at the 2nd 
>>>>> character of a string?!” problem.  That is, use the identifier “$” (yes, 
>>>>> that’s an identifier in Swift) to denote the beginning-or-end of a 
>>>>> collection.  Thus,
>>>>>  
>>>>>   c[c.startIndex.advancedBy(3)] =>c[$+3]        // Python: c[3]
>>>>>   c[c.endIndex.advancedBy(-3)] =>c[$-3]        // Python: c[-3]
>>>>>  
>>>>>   c.dropFirst(3)  =>c[$+3...]     // Python: c[3:]
>>>>>   c.dropLast(3) =>c[..<$-3]     // Python: c[:-3]
>>>>>   c.prefix(3) =>c[..<$+3]     // Python: c[:3]
>>>>>   c.suffix(3) => c[$-3...]     // Python: c[-3:]
>>>>>  
>>>>> It even has the nice connotation that, “this might be a little more 
>>>>> expen$ive than plain indexing” (which it might, for non-random-access 
>>>>> collections).  I think the syntax is still a bit heavy, not least because 
>>>>> of “..<“ and “...”, but the direction has potential. 
>>>>>  
>>>>>  I haven’t had the time to really experiment with a design like this; the 
>>>>> community might be able to help by prototyping and using some 
>>>>> alternatives.  You can do all of this outside the standard library with 
>>>>> extensions.
>>>>  
>>>> Interesting idea.
>>>>  
>>>> One downside is it masks potentially O(N) operations 
>>>> (ForwardIndex.advancedBy()) behind the + operator, which is typically 
>>>> assumed to be an O(1) operation.
>>>  
>>> Yeah, but the “$” is sufficiently unusual that it doesn’t bother me too 
>>> much.
>>>  
>>>> Alos, the $+3 syntax suggests that it requires there to be at least 3 
>>>> elements in the sequence, but prefix()/suffix()/dropFirst/etc. all take 
>>>> maximum counts, so they operate on sequences of fewer elements.
>>>  
>>> For indexing, $+3 would make that requirement.  For slicing, it wouldn’t.  
>>> I’m not sure why you say something about the syntax suggests exceeding 
>>> bounds would be an error.
>>  
>> Because there's no precedent for + behaving like a saturating addition, not 
>> in Swift and not, to my knowledge, anywhere else either. The closest example 
>> that comes to mind is floating-point numbers eventually ending up at 
>> Infinity, but that's not really saturating addition, that's just a 
>> consequence of Infinity + anything == Infinity. Nor do I think we should be 
>> establishing precedent of using + for saturating addition, because that 
>> would be surprising to people.
> 
> To call this “saturating addition” is an…interesting…interpretation.  I don’t 
> view it that way at all.  The “saturation,” if there is any, happens as part 
> of subscripting.  You don’t even know what the “saturation limit” is until 
> you couple the range expression with the collection.  
> 
> In my view, the addition is part of an EDSL that represents a notional 
> position offset from the start or end, then the subscript operation 
> forgivingly trims these offsets as needed.
> 
>> Additionally, I don't think adding a $ to an array slice expression should 
>> result in a behavioral difference, e.g. array[3..<array.endIndex] and 
>> array[$+3..<$] should behave the same
> 
> I see your point, but don’t (necessarily) agree with you there.  “$” here is 
> used as an indicator of several of things, including not-necessarily-O(1) and 
> forgiving slicing.  We could introduce a label just to handle that:
> 
>  array[forgivingAndNotO1: $+3..<$]  
> 
> but it doesn’t look like a win to me.
> 
>>  
>>>> There's also some confusion with using $ for both start and end. What if I 
>>>> say c[$..<$]? We'd have to infer from position that the first $ is the 
>>>> start and the second $ is the end, but then what about c[$+n..<$+m]? We 
>>>> can't treat the usage of + as meaning "from start" because the argument 
>>>> might be negative. And if we use the overall sign of the 
>>>> operation/argument together, then the expression `$+n` could mean from 
>>>> start or from end, which comes right back to the problem with Python 
>>>> syntax.
>>>  
>>> There’s a problem with Python syntax?  I’m guessing you mean that c[a:b] 
>>> can have very different interpretations depending on whether a and b are 
>>> positive or negative?
>>  
>> Exactly.
>>  
>>> First of all, I should say: that doesn’t really bother me.  The 99.9% use 
>>> case for this operation uses literal constants for the offsets, and I 
>>> haven’t heard of it causing confusion for Python programmers.  That said, 
>>> if we wanted to address it, we could easily require n and m above to be 
>>> literals, rather than Ints (which incidentally guarantees it’s an O(1) 
>>> operation).  That has upsides and downsides of course.
>>  
>> I don't think we should add this feature in any form if it only supports 
>> literals.
>>   
>>>> I think Jacob's idea has some promise though:
>>>>  
>>>> c[c.startIndex.advancedBy(3)] => c[fromStart: 3]
>>>> c[c.endIndex.advancedBy(-3)] => c[fromEnd: 3]
>>>  
>>>> But naming the slice operations is a little trickier. We could actually 
>>>> just go ahead and re-use the existing method names for those:
>>>>  
>>>> c.dropFirst(3) => c[dropFirst: 3]
>>>> c.dropLast(3) => c[dropLast: 3]
>>>> c.prefix(3) => c[prefix: 3]
>>>> c.suffix(3) => c[suffix: 3]
>>>>  
>>>> That's not so compelling, since we already have the methods, but I suppose 
>>>> it makes sense if you want to try and make all slice-producing methods use 
>>>> subscript syntax (which I have mixed feelings about).
>>>  
>>> Once we get efficient in-place slice mutation (via slice addressors), it 
>>> becomes a lot more compelling, IMO.  But I still don’t find the naming 
>>> terribly clear, and I don’t love that one needs to combine two subscript 
>>> operations in order to drop the first and last element or take just 
>>> elements 3..<5.
>>  
>> You can always add more overloads, such as
>>  
>> c[dropFirst: 3, dropLast: 5]
>>  
>> but I admit that there's a bunch of combinations here that would need to be 
>> added.
>> 
> 
> My point is that we have an English language soup that doesn’t compose 
> naturally.  Slicing in Python is much more elegant and composes well.  If we 
> didn’t currently have 6 separate methods (7 including subscript for 
> index-based slicing) for handling this, that need to be separately documented 
> and understood, I wouldn’t be so eager to replace the words with an EDSL, but 
> in this case IMO it is an overall simplification.
> 
>> My concern over trying to make it easier to take elements 3..<5 is that 
>> incrementing indexes is verbose for a reason, and adding a feature that 
>> makes it really easy to index into any collection by using integers is a bad 
>> idea as it will hide O(N) operations behind code that looks like O(1). And 
>> hiding these operations makes it really easy to accidentally turn an O(N) 
>> algorithm into an O(N^2) algorithm.
> 
> As I’ve said, I consider the presence of “$” to be enough of an indicator 
> that something co$tly is happening, though I’m open to other ways of 
> indicating it.  I’m trying to strike a balance between “rigorous” and “easy 
> to use,” here.  Remember that Swift has to work in playgrounds and for 
> beginning programmers, too.  I am likewise unsatisfied with the (lack of) 
> ease-of-use of String as well (e.g. for lexing and parsing tasks), and have 
> made improving it a priority for Swift 3.  I view fixing the slicing 
> interface as part of that job.
> 
>>> Even if we need separate symbols for “start” and “end” (e.g. using “$” for 
>>> both might just be too confusing for people in the end, even if it works 
>>> otherwise), I still think a generalized form that allows ranges to be used 
>>> everywhere for slicing is going to be much easier to understand than this 
>>> hodgepodge of words we use today.
>>  
>> I'm tempted to say that if we do this, we should use two different sigils, 
>> and more importantly we should not use + and - but instead use methods on 
>> the sigils like advancedBy(), as if the sigils were literally placeholders 
>> for the start/end index. That way we won't write code that looks O(1) when 
>> it's not. For example:
>>  
>> col[^.advancedBy(3)..<$]
>>  
>> Although we'd need to revisit the names a little, because $.advancedBy(-3) 
>> is a bit odd when we know that $ can't ever take a non-negative number for 
>> that.
>>  
>> Or maybe we should just use $ instead as a token that means "the collection 
>> being indexed", so you'd actually say something like
>>  
>> col[$.startIndex.advancedBy(3)..<$.startIndex.advancedBy(5)]
> 
> I really like that direction, but I don’t think it does enough to solve the 
> ease-of-use problem; I still think the result looks and feels horrible 
> compared to Python for the constituencies mentioned above.  
> 
> I briefly implemented this syntax, that was intended to suggest repeated 
> incrementation:
> 
>       col.startIndex++3 // col.startIndex.advancedBy(3)
> 
> I don’t think that is viable, especially now that we’ve dropped “++” and 
> “--“. But this syntax 
> 
>       col[$.start⛄️3..<$.start⛄️5]
> 
> begins to be interesting for some definition of ⛄️.
> 
>> This solves the problem of subscripting a collection without having to store 
>> it in a local variable, without discarding any of the intentional index 
>> overhead. Of course, if the goal is to make index operations more concise 
>> this doesn't really help much, but my argument here is that it's hard to cut 
>> down on the verbosity without hiding O(N) operations.
> 
> That ship has already sailed somewhat, because e.g. every Collection has to 
> have a count property, which can be O(N).  But I still like to uphold it 
> where possible.  I just don’t think the combination of “+” and “$” 
> necessarily has such a strong O(1) connotation… especially because the 
> precedent for seeing those symbols together is regexps.
> 
>>  
>> -Kevin Ballard
>>  
>>>> But the [fromStart:] and [fromEnd:] subscripts seem useful.
>>> Yeah… I really want a unified solution that covers slicing as well as 
>>> offset indexing.
>>>  
>>> -Dave
>>>  
>>  
> 
> -Dave
> 
> 
> 
>  _______________________________________________
> swift-evolution mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-evolution 
> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Proposal: Python's indexing and slicing

Reply via email to