> On May 30, 2017, at 16:13, Dave Abrahams <dabrah...@apple.com> wrote: > > > on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com > <http://at-apple.com/>> wrote: > >>> On May 30, 2017, at 14:53, Dave Abrahams <dabrah...@apple.com> wrote: >>> >>> >>> on Tue May 30 2017, Jordan Rose <jordan_rose-AT-apple.com> wrote: >>> >>>> My knee-jerk reaction is to say it's too late in Swift 4 for this kind >>>> of change, but with that out of the way, I'm most concerned about what >>>> it means to have, say, a UTF-8 index that's not on a UTF-16 boundary. >>>> >>>> let str = "言" >>>> let oneUnitIn = str.utf8.index(after: str.utf8.startIndex) >>>> let trailingBytes = str.utf8[oneUnitIn...] >>> >>> This is not new; it exists today. >> >> Yes, I think that’s valuable. What’s different is that it’s not a >> String.Index. >> >>> >>>> What can I do with 'oneUnitIn'? >>> >>> All the usual stuff; we're not proposing to change what you can do with >>> it. >> >> By changing the type, you have increased the scope of where an index >> can be used. What happens when I use it in one of the other views and >> it’s not on a boundary? >> >> (I suspect the answer is “it traps” but the proposal should spell that >> out explicitly.) > > Sorry, I mistakenly limited the “rounding down” behavior to slicing and > range replacement. The index would be rounded down to the previous > boundary, and then used as ever.
Makes sense! > >> >>> >>>> How do I test to see if it's on a Character boundary or a >>>> UnicodeScalar boundary? >>> >>> as noted, >>> >>> Replacing the failable APIs listed [above](#motivation) that detect >>> whether an index represents a valid position in a given view, and >>> enhancement that explicitly round index positions to nearby boundaries >>> in a given view, are left to a later proposal. For now, we do not >>> propose to remove the existing index conversion APIs. >>> >>> That means you can use oneUnitIn.samePosition(in: str) or >>> oneUnitIn.samePosition(in: str.unicodeScalars) to find out if it's on ta >>> character or unicode scalar boundary. >> >> I’m sorry, I completely missed that. This part of the question is withdrawn. >> >> I’m also concerned about putting “UTF-16” in the documentation for >> encodedOffset. Either it’s a ‘utf16Offset’ or it isn’t > > It is today; hopefully it won't be someday > >> ; if it’s an opaque value then it should be treated as such. > > Today a String has underlying UTF-16-compatible storage and that's > documented as such, but we intend to lift that restriction and don't > want the names to lock us into semantics. I don’t think you should promise that about new APIs, then, or someone will start relying on it. > >> (It’s also a little disturbing that round-tripping through >> encodedOffset isn’t guaranteed to give you the same index back.) > > Define “same.” > > The encodedOffset is not the full value of an *arbitrary* index, and > doesn't claim to be. The indices that can be serialized and > reconstructed exactly using encodedOffset are those that fall on code > unit boundaries. Today, that means everything but UTF-8 indices. We > could consider exposing the transcodedOffset (offset within the UTF8 > encoding of the scalar) as well, but I want to be conservative. I’m not sure it’s clear from the name “encodedOffset” that this is a lossy conversion. I’d say it should be an optional property, but that’s probably too annoying in the invalid case. Maybe it should trap. Jordan
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution