> 
> On Aug 17, 2016, at 12:20 PM, Shawn Erickson <shaw...@gmail.com> wrote:
> 
> As stated earlier it is 2016

I don’t like the tone attached to this statement.

> I think the baseline should be robust Unicode support

I don’t understand how anything I have pushed for would compromise robust 
Unicode support.

> and what we have in Swift is actually a fairly good way of dealing with it 
> IMHO. I think new to development folks should have this as their baseline as 
> well…

> not that we shouldn't make it as easy to work with as possible.

Regardless of internal representation, wouldn’t this be a glyph-based indexing 
system?

-Kenny


> 
> -Shawn
> 
> On Wed, Aug 17, 2016 at 12:15 PM Kenny Leung via swift-evolution 
> <swift-evolution@swift.org> wrote:
> It seems to me that UTF-8 is the best choice to encode strings in English and 
> English-like character sets for storage, but it’s not clear that it is the 
> most useful or performant internal representation for working with strings. 
> In my opinion, conflating the preferred storage format and the best internal 
> representation is not the proper thing to do. Picking the right internal 
> storage format should be evaluated based on its own criteria. Even as an 
> experienced programmer, I assert that the most useful indexing system is 
> glyph based.
> 
> In Félix’s case, I would expect to have to ask for a mail-friendly 
> representation of his name, just like you have to ask for a 
> filesystem-friendly representation of a filename regardless of what the 
> internal representation is. Just because you are using UTF-8 as the internal 
> format, it does not mean that universal support is guaranteed.
> 
> In response to this statement: “Optimizing developer experience for beginning 
> developers is just going to lead to software that screws…”, the current 
> system trips up not only beginning developers, but is different from pretty 
> much every programming language in my experience.
> 
> -Kenny
> 
> 
> > On Aug 17, 2016, at 11:48 AM, Zach Waldowski via swift-evolution 
> > <swift-evolution@swift.org> wrote:
> >
> > It's 2016, "the thing people would most commonly expect"
> > impossible-to-screw-up Unicode support that's performance. Optimizing
> > developer experience for beginning developers is just going to lead to
> > software that screws up in situations the developer doesn't anticipate,
> > as F+¬lix notes above.
> >
> > Zachary
> >
> > On Wed, Aug 17, 2016, at 09:40 AM, Kenny Leung via swift-evolution
> > wrote:
> >> I understand that the most friendly approach may not be the most
> >> efficient, but that’s not what I’m pushing for. I’m pushing for "does the
> >> thing people would most commonly expect”. Take a first-time programmer
> >> who reads any (human) language, and that is what they would expect.
> >>
> >> Why couldn’t String’s internal storage format be glyph-based? If I were,
> >> say, writing a text editor, it would certainly be the easiest and most
> >> efficient format to work in.
> >>
> >> -Kenny
> >>
> >>
> >>> On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felix...@yahoo.ca> wrote:
> >>>
> >>> The major problem with this approach is that visual glyphs themselves 
> >>> have one level of variable-length encoding, and they sit on top of 
> >>> another variable-length encoding used to represent the Unicode characters 
> >>> (Swift-native Strings are currently encoded as UTF-8). For instance, the 
> >>> visual glyph 🇺🇸 is the the result of putting side-by-side the Unicode 
> >>> characters 🇺 and  🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL 
> >>> INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 
> >>> 4 bytes each. A design in which you can "just write" string[4544] hides 
> >>> the fact that indexing is a linear-time operation that needs to recompose 
> >>> UTF-8 characters and then recompose visual glyphs on top of that.
> >>>
> >>> Generally speaking, I *think* that I agree that human-geared "long 
> >>> string" on which you probably won't need random access, and 
> >>> machine-geared smaller strings that encode a command, could benefit from 
> >>> not being considered the same fundamental thing. However, I'm also afraid 
> >>> that this will end with more applications and websites that think that 
> >>> first names only contain 7-bit-clean characters in the A-Z range. (I live 
> >>> in the US and I can attest that this is still very common.)
> >>>
> >>> You could make a point too that better facilities to parse strings would 
> >>> probably address this issue.
> >>>
> >>> Félix
> >>>
> >>>> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution 
> >>>> <swift-evolution@swift.org> a écrit :
> >>>>
> >>>> I agree with both points of view. I think we need to bring back 
> >>>> subscripting on strings which does the thing people would most commonly 
> >>>> expect.
> >>>>
> >>>> I would say that the subscripts indexes should correspond to a visual 
> >>>> glyph. This seems reasonable to me for most character sets like Roman, 
> >>>> Cyrillic, Chinese. There is some doubt in my mind for things like 
> >>>> subscripted Japanese or connected (ligatured?) languages like Arabic, 
> >>>> Hindi or Thai.
> >>>>
> >>>> -Kenny
> >>>>
> >>>>
> >>>>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution 
> >>>>> <swift-evolution@swift.org> wrote:
> >>>>>
> >>>>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution 
> >>>>> <swift-evolution@swift.org> wrote:
> >>>>> Back in Swift 1.0, subscripting a String was easy, you could just use 
> >>>>> subscripting in a very Python like way. But now, things are a bit more 
> >>>>> complicated. I recognize why we need syntax like 
> >>>>> str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes 
> >>>>> things hard on beginners. If one of Swift's goals is to make it a great 
> >>>>> first language, this syntax fights that. Imagine having to explain 
> >>>>> Unicode and character size to an 8 year old. This is doubly problematic 
> >>>>> because String manipulation is one of the first things new coders might 
> >>>>> want to do.
> >>>>>
> >>>>> What about having an InternalString subclass that only supports one 
> >>>>> encoding, allowing it to be subscripted with Ints? The idea is that an 
> >>>>> InternalString is for Strings that are more or less hard coded into the 
> >>>>> app. Dictionary keys, enum raw values, that kind of stuff. This also 
> >>>>> has the added benefit of forcing the programmer to think about what the 
> >>>>> String is being used for. Is it user facing? Or is it just for internal 
> >>>>> use? And of course, it makes code dealing with String manipulation much 
> >>>>> more concise and readable.
> >>>>>
> >>>>> It follows that something like this would need to be entered as a 
> >>>>> literal to make it as easy as using String. One way would be to make 
> >>>>> all String literals InternalStrings, but that sounds far too drastic. 
> >>>>> Maybe appending an exclamation point like "this"! Or even just wrapping 
> >>>>> the whole thing in exclamation marks like !"this"! Of course, we could 
> >>>>> go old school and write it like @"this" …That last one is a joke.
> >>>>>
> >>>>> I'll be the first to admit I'm way in over my head here, so I'm very 
> >>>>> open to suggestions and criticism. Thanks!
> >>>>>
> >>>>> I can sympathize, but this is tricky.
> >>>>>
> >>>>> Fundamentally, if it's going to be a learning and teaching issue, then 
> >>>>> this "easy" string should be the default. That is to say, if I write 
> >>>>> `var a = "Hello, world!"`, then `a` should be inferred to be of type 
> >>>>> InternalString or EasyString, whatever you want to call it.
> >>>>>
> >>>>> But, we also want Swift to support Unicode by default, and we want that 
> >>>>> support to do things The Right Way(TM) by default. In other words, a 
> >>>>> user should not have to reach for a special type in order to handle 
> >>>>> arbitrary strings correctly, and I should be able to reassign `a = 
> >>>>> "你好"` and have things work as expected. So, we also can't have the 
> >>>>> "easy" string type be the default...
> >>>>>
> >>>>> I can't think of a way to square that circle.
> >>>>>
> >>>>>
> >>>>> Sent from my iPad
> >>>>>
> >>>>> _______________________________________________
> >>>>> swift-evolution mailing list
> >>>>> swift-evolution@swift.org
> >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> swift-evolution mailing list
> >>>>> swift-evolution@swift.org
> >>>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> >>>>
> >>>> _______________________________________________
> >>>> swift-evolution mailing list
> >>>> swift-evolution@swift.org
> >>>> https://lists.swift.org/mailman/listinfo/swift-evolution
> >>>
> >>
> >> _______________________________________________
> >> swift-evolution mailing list
> >> swift-evolution@swift.org
> >> https://lists.swift.org/mailman/listinfo/swift-evolution
> > _______________________________________________
> > swift-evolution mailing list
> > swift-evolution@swift.org
> > https://lists.swift.org/mailman/listinfo/swift-evolution
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to