Re: [swift-evolution] Pitch: String Index Overhaul

Jordan Rose via swift-evolution Tue, 30 May 2017 20:03:07 -0700

> On May 30, 2017, at 16:13, Dave Abrahams <[email protected]> wrote:
> 
> 
> on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com 
> <http://at-apple.com/>> wrote:
> 
>>> On May 30, 2017, at 14:53, Dave Abrahams <[email protected]> wrote:
>>> 
>>> 
>>> on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote:
>>> 
>>>> My knee-jerk reaction is to say it's too late in Swift 4 for this kind
>>>> of change, but with that out of the way, I'm most concerned about what
>>>> it means to have, say, a UTF-8 index that's not on a UTF-16 boundary.
>>>> 
>>>> let str = "言"
>>>> let oneUnitIn = str.utf8.index(after: str.utf8.startIndex)
>>>> let trailingBytes = str.utf8[oneUnitIn...]
>>> 
>>> This is not new; it exists today.
>> 
>> Yes, I think that’s valuable. What’s different is that it’s not a 
>> String.Index.
>> 
>>> 
>>>> What can I do with 'oneUnitIn'? 
>>> 
>>> All the usual stuff; we're not proposing to change what you can do with
>>> it.
>> 
>> By changing the type, you have increased the scope of where an index
>> can be used. What happens when I use it in one of the other views and
>> it’s not on a boundary?
>> 
>> (I suspect the answer is “it traps” but the proposal should spell that
>> out explicitly.)
> 
> Sorry, I mistakenly limited the “rounding down” behavior to slicing and
> range replacement.  The index would be rounded down to the previous
> boundary, and then used as ever.


Makes sense!

> 
>> 
>>> 
>>>> How do I test to see if it's on a Character boundary or a
>>>> UnicodeScalar boundary?
>>> 
>>> as noted,
>>> 
>>> Replacing the failable APIs listed [above](#motivation) that detect
>>> whether an index represents a valid position in a given view, and
>>> enhancement that explicitly round index positions to nearby boundaries
>>> in a given view, are left to a later proposal.  For now, we do not
>>> propose to remove the existing index conversion APIs.
>>> 
>>> That means you can use oneUnitIn.samePosition(in: str) or
>>> oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta
>>> character or unicode scalar boundary.
>> 
>> I’m sorry, I completely missed that. This part of the question is withdrawn.
>> 
>> I’m also concerned about putting “UTF-16” in the documentation for
>> encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t
> 
> It is today; hopefully it won't be someday
> 
>> ; if it’s an opaque value then it should be treated as such. 
> 
> Today a String has underlying UTF-16-compatible storage and that's
> documented as such, but we intend to lift that restriction and don't
> want the names to lock us into semantics.

I don’t think you should promise that about new APIs, then, or someone will 
start relying on it.


> 
>> (It’s also a little disturbing that round-tripping through
>> encodedOffset isn’t guaranteed to give you the same index back.)
> 
> Define “same.”  
> 
> The encodedOffset is not the full value of an *arbitrary* index, and
> doesn't claim to be.  The indices that can be serialized and
> reconstructed exactly using encodedOffset are those that fall on code
> unit boundaries.  Today, that means everything but UTF-8 indices.  We
> could consider exposing the transcodedOffset (offset within the UTF8
> encoding of the scalar) as well, but I want to be conservative.

I’m not sure it’s clear from the name “encodedOffset” that this is a lossy 
conversion. I’d say it should be an optional property, but that’s probably too 
annoying in the invalid case. Maybe it should trap.

Jordan

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Pitch: String Index Overhaul

Reply via email to