on Wed Jun 14 2017, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote: > On Wed, Jun 14, 2017 at 12:01 PM, Dave Abrahams <dabrah...@apple.com> wrote: > >> >> on Wed Jun 14 2017, Xiaodi Wu <xiaodi.wu-AT-gmail.com> wrote: >> >> > On Wed, Jun 14, 2017 at 09:26 Xiaodi Wu <xiaodi...@gmail.com> wrote: >> > >> >> If we leave aside for a moment the nomenclature issue where everything >> in >> >> Foundation referring to a character is really referring to a Unicode >> >> scalar, Kevin’s example illustrates the whole problem in a nutshell, >> >> doesn’t it? In that example, we have a straightforward attempt to slice >> >> with a misaligned index. The totality of options here are: >> >> >> >> * return nil, an option the rejection of which is the premise of your >> >> proposal >> >> * return a partial character (i.e., \u{301}), an option which we haven’t >> >> yet talked about in this thread–seems like this could have simpler >> >> semantics, potentially yields garbage if the index is garbage but in the >> >> case of Kevin’s example actually behaves as the user might expect >> >> I think that's exactly what I was proposing in >> https://lists.swift.org/pipermail/swift-evolution/ >> Week-of-Mon-20170612/037466.html >> >> >> * return a whole character after “rounding down”–difficult semantics >> >> to define and explain, always results in a whole character but in the >> >> case of Kevin’s example gives an unexpected answer * returns a whole >> >> character after “rounding up”–difficult semantics to define and >> >> explain, always results in a whole character but when the index is >> >> misaligned would result in a character or range of characters in >> >> which the index is not found * trap–simple semantics, never returns >> >> garbage, obvious disadvantage that execution will not proceed >> >> >> >> No clearly perfect answer here. However, _if_ we hew strictly to the >> >> stated premise of your proposal that failable APIs are awkward enough to >> >> justify a change, and moreover that the awkwardness is truly “needless” >> >> because of the rarity of misaligned index usage, then at face value >> >> trapping should be a perfectly acceptable solution. >> >> >> >> That Kevin’s example raises the specter of trapping being a realistic >> >> occurrence in currently working code actually suggests a challenge to >> your >> >> stated premise. If we accept that this challenge is a substantial one, >> then >> >> it’s not clear to me that abandoning failable APIs should be ruled out >> from >> >> the outset. >> >> >> >> However, if this desire to remove failable APIs remains strong then I >> >> wonder if the undiscussed second option above is worth at least some >> >> consideration. >> >> >> > >> > Having digested your revised proposed behavior a little better I see >> you’re >> > kind of getting at this exact issue, but I’m uncomfortable with how it’s >> so >> > tied to the underlying encoding, which is not guaranteed to be UTF-16 but >> > is assumed to be for the purposes of slicing. >> >> I think there's some confusion here; probably I have failed to explain >> myself. Today a String happens to always be UTF-16, but there's no >> intention to assume that it is UTF-16 for the purposes of slicing in the >> future. Any place you see something like s.utf16 in an example I've >> used to illustrate semantics should be interpreted as a s.codeUnits, >> where codeUnits is a collection of code units for whatever the >> underlying encoding is. >> >> Tying this to underlying encoding actually reflects the true nature of >> String, which is exposed by the semantics of concatenation and range >> replacement, where multiple elements may merge into one element). As >> stated in >> https://github.com/apple/swift/blob/master/docs/StringManifesto.md#string- >> should-be-a-collection-of-characters-again >> the elements of a String (or any of its views other than native code >> units) is an emergent property. To anyone operating at Unicode scalar >> granularity (which can result in misalignment with respect to >> characters) or at the higher granularity of code units (native or >> transcoded, which can result in misalignment with all other views), I >> think this is actually unsurprising. >> > > That's fair. It this is critical to the semantics, though, and you expect > that some people will operate at that granularity, it seems incongruous > that s.codeUnits isn't actually exposed to the user even if it'd be as a > type-erased AnyCollection.
I agree. Exposing .codeUnits is part of the longer-term plan, but I'm trying to keep mostly-orthogonal issues out of this proposal. >> > I’d like to propose an alternative that attempts to deliver on what >> > I’ve called the second option above–somewhat similar: >> > >> > A string index will notionally or actually keep track of the view >> > in which it was originally aligned, be it utf8, utf16, >> > unicodeScalars, or characters. A slicing operation str.xxx[idx] >> > will behave as expected if idx is not misaligned with respect to >> > str.xxx. If it is misaligned, the operation would instead be >> > notionally String(str.yyy[idx...]).xxx. first!, where yyy is the >> > original view in which idx was known aligned–if idx is not also >> > misaligned with respect to str.yyy (as might be the case if idx was >> > returned from an operation on a different string). If it is still >> > misaligned, trap. >> >> That seems much more complicsted than what I'm proposing, but maybe >> that's because I haven't yet explained myself clearly enough. >> > > I think I catch your drift, and I'm converging on your way of thinking > here. :-) -- -Dave _______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution