I like a "view" based system when looking at a Unicode string. It lets you pick the view of string - defining how it is indexed - based on your needs. A view could be indexed by a human facing glyph, a particular Unicode encoding style, a decompose style, etc.
I think that is powerful, useful, and exposes the real complexity in a manageable and functional way. In many domains you would never need to care about indexing across a view or even using a view to work with a string. On Wed, Aug 17, 2016 at 12:27 PM Kenny Leung via swift-evolution < swift-evolution@swift.org> wrote: > > > > On Aug 17, 2016, at 12:20 PM, Shawn Erickson <shaw...@gmail.com> wrote: > > > > As stated earlier it is 2016 > > I don’t like the tone attached to this statement. > > > I think the baseline should be robust Unicode support > > I don’t understand how anything I have pushed for would compromise robust > Unicode support. > > > and what we have in Swift is actually a fairly good way of dealing with > it IMHO. I think new to development folks should have this as their > baseline as well… > > > not that we shouldn't make it as easy to work with as possible. > > Regardless of internal representation, wouldn’t this be a glyph-based > indexing system? > > -Kenny > > > > > > -Shawn > > > > On Wed, Aug 17, 2016 at 12:15 PM Kenny Leung via swift-evolution < > swift-evolution@swift.org> wrote: > > It seems to me that UTF-8 is the best choice to encode strings in > English and English-like character sets for storage, but it’s not clear > that it is the most useful or performant internal representation for > working with strings. In my opinion, conflating the preferred storage > format and the best internal representation is not the proper thing to do. > Picking the right internal storage format should be evaluated based on its > own criteria. Even as an experienced programmer, I assert that the most > useful indexing system is glyph based. > > > > In Félix’s case, I would expect to have to ask for a mail-friendly > representation of his name, just like you have to ask for a > filesystem-friendly representation of a filename regardless of what the > internal representation is. Just because you are using UTF-8 as the > internal format, it does not mean that universal support is guaranteed. > > > > In response to this statement: “Optimizing developer experience for > beginning developers is just going to lead to software that screws…”, the > current system trips up not only beginning developers, but is different > from pretty much every programming language in my experience. > > > > -Kenny > > > > > > > On Aug 17, 2016, at 11:48 AM, Zach Waldowski via swift-evolution < > swift-evolution@swift.org> wrote: > > > > > > It's 2016, "the thing people would most commonly expect" > > > impossible-to-screw-up Unicode support that's performance. Optimizing > > > developer experience for beginning developers is just going to lead to > > > software that screws up in situations the developer doesn't anticipate, > > > as F+¬lix notes above. > > > > > > Zachary > > > > > > On Wed, Aug 17, 2016, at 09:40 AM, Kenny Leung via swift-evolution > > > wrote: > > >> I understand that the most friendly approach may not be the most > > >> efficient, but that’s not what I’m pushing for. I’m pushing for "does > the > > >> thing people would most commonly expect”. Take a first-time programmer > > >> who reads any (human) language, and that is what they would expect. > > >> > > >> Why couldn’t String’s internal storage format be glyph-based? If I > were, > > >> say, writing a text editor, it would certainly be the easiest and most > > >> efficient format to work in. > > >> > > >> -Kenny > > >> > > >> > > >>> On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felix...@yahoo.ca> > wrote: > > >>> > > >>> The major problem with this approach is that visual glyphs > themselves have one level of variable-length encoding, and they sit on top > of another variable-length encoding used to represent the Unicode > characters (Swift-native Strings are currently encoded as UTF-8). For > instance, the visual glyph 🇺🇸 is the the result of putting side-by-side > the Unicode characters 🇺 and 🇸("REGIONAL INDICATOR SYMBOL LETTER U" and > "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as > UTF-8 using 4 bytes each. A design in which you can "just write" > string[4544] hides the fact that indexing is a linear-time operation that > needs to recompose UTF-8 characters and then recompose visual glyphs on top > of that. > > >>> > > >>> Generally speaking, I *think* that I agree that human-geared "long > string" on which you probably won't need random access, and machine-geared > smaller strings that encode a command, could benefit from not being > considered the same fundamental thing. However, I'm also afraid that this > will end with more applications and websites that think that first names > only contain 7-bit-clean characters in the A-Z range. (I live in the US and > I can attest that this is still very common.) > > >>> > > >>> You could make a point too that better facilities to parse strings > would probably address this issue. > > >>> > > >>> Félix > > >>> > > >>>> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution < > swift-evolution@swift.org> a écrit : > > >>>> > > >>>> I agree with both points of view. I think we need to bring back > subscripting on strings which does the thing people would most commonly > expect. > > >>>> > > >>>> I would say that the subscripts indexes should correspond to a > visual glyph. This seems reasonable to me for most character sets like > Roman, Cyrillic, Chinese. There is some doubt in my mind for things like > subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi > or Thai. > > >>>> > > >>>> -Kenny > > >>>> > > >>>> > > >>>>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote: > > >>>>> > > >>>>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via > swift-evolution <swift-evolution@swift.org> wrote: > > >>>>> Back in Swift 1.0, subscripting a String was easy, you could just > use subscripting in a very Python like way. But now, things are a bit more > complicated. I recognize why we need syntax like > str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes > things hard on beginners. If one of Swift's goals is to make it a great > first language, this syntax fights that. Imagine having to explain Unicode > and character size to an 8 year old. This is doubly problematic because > String manipulation is one of the first things new coders might want to do. > > >>>>> > > >>>>> What about having an InternalString subclass that only supports > one encoding, allowing it to be subscripted with Ints? The idea is that an > InternalString is for Strings that are more or less hard coded into the > app. Dictionary keys, enum raw values, that kind of stuff. This also has > the added benefit of forcing the programmer to think about what the String > is being used for. Is it user facing? Or is it just for internal use? And > of course, it makes code dealing with String manipulation much more concise > and readable. > > >>>>> > > >>>>> It follows that something like this would need to be entered as a > literal to make it as easy as using String. One way would be to make all > String literals InternalStrings, but that sounds far too drastic. Maybe > appending an exclamation point like "this"! Or even just wrapping the whole > thing in exclamation marks like !"this"! Of course, we could go old school > and write it like @"this" …That last one is a joke. > > >>>>> > > >>>>> I'll be the first to admit I'm way in over my head here, so I'm > very open to suggestions and criticism. Thanks! > > >>>>> > > >>>>> I can sympathize, but this is tricky. > > >>>>> > > >>>>> Fundamentally, if it's going to be a learning and teaching issue, > then this "easy" string should be the default. That is to say, if I write > `var a = "Hello, world!"`, then `a` should be inferred to be of type > InternalString or EasyString, whatever you want to call it. > > >>>>> > > >>>>> But, we also want Swift to support Unicode by default, and we want > that support to do things The Right Way(TM) by default. In other words, a > user should not have to reach for a special type in order to handle > arbitrary strings correctly, and I should be able to reassign `a = "你好"` > and have things work as expected. So, we also can't have the "easy" string > type be the default... > > >>>>> > > >>>>> I can't think of a way to square that circle. > > >>>>> > > >>>>> > > >>>>> Sent from my iPad > > >>>>> > > >>>>> _______________________________________________ > > >>>>> swift-evolution mailing list > > >>>>> swift-evolution@swift.org > > >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution > > >>>>> > > >>>>> > > >>>>> _______________________________________________ > > >>>>> swift-evolution mailing list > > >>>>> swift-evolution@swift.org > > >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution > > >>>> > > >>>> _______________________________________________ > > >>>> swift-evolution mailing list > > >>>> swift-evolution@swift.org > > >>>> https://lists.swift.org/mailman/listinfo/swift-evolution > > >>> > > >> > > >> _______________________________________________ > > >> swift-evolution mailing list > > >> swift-evolution@swift.org > > >> https://lists.swift.org/mailman/listinfo/swift-evolution > > > _______________________________________________ > > > swift-evolution mailing list > > > swift-evolution@swift.org > > > https://lists.swift.org/mailman/listinfo/swift-evolution > > > > _______________________________________________ > > swift-evolution mailing list > > swift-evolution@swift.org > > https://lists.swift.org/mailman/listinfo/swift-evolution > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution >
_______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution