Re: [swift-evolution] InternalString class for easy String manipulation

Félix Cloutier via swift-evolution Mon, 15 Aug 2016 21:21:07 -0700

The major problem with this approach is that visual glyphs themselves have one 
level of variable-length encoding, and they sit on top of another 
variable-length encoding used to represent the Unicode characters (Swift-native 
Strings are currently encoded as UTF-8). For instance, the visual glyph 🇺🇸 is 
the the result of putting side-by-side the Unicode characters 🇺 and  
🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL INDICATOR SYMBOL LETTER 
S"), which are themselves encoded as UTF-8 using 4 bytes each. A design in 
which you can "just write" string[4544] hides the fact that indexing is a 
linear-time operation that needs to recompose UTF-8 characters and then 
recompose visual glyphs on top of that.


Generally speaking, I *think* that I agree that human-geared "long string" on 
which you probably won't need random access, and machine-geared smaller strings 
that encode a command, could benefit from not being considered the same 
fundamental thing. However, I'm also afraid that this will end with more 
applications and websites that think that first names only contain 7-bit-clean 
characters in the A-Z range. (I live in the US and I can attest that this is 
still very common.)

You could make a point too that better facilities to parse strings would 
probably address this issue.

Félix

> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution 
> <swift-evolution@swift.org> a écrit :
> 
> I agree with both points of view. I think we need to bring back subscripting 
> on strings which does the thing people would most commonly expect.
> 
> I would say that the subscripts indexes should correspond to a visual glyph. 
> This seems reasonable to me for most character sets like Roman, Cyrillic, 
> Chinese. There is some doubt in my mind for things like subscripted Japanese 
> or connected (ligatured?) languages like Arabic, Hindi or Thai.
> 
> -Kenny
> 
> 
>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution 
>> <swift-evolution@swift.org> wrote:
>> 
>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution 
>> <swift-evolution@swift.org> wrote:
>> Back in Swift 1.0, subscripting a String was easy, you could just use 
>> subscripting in a very Python like way. But now, things are a bit more 
>> complicated. I recognize why we need syntax like 
>> str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes 
>> things hard on beginners. If one of Swift's goals is to make it a great 
>> first language, this syntax fights that. Imagine having to explain Unicode 
>> and character size to an 8 year old. This is doubly problematic because 
>> String manipulation is one of the first things new coders might want to do. 
>> 
>> What about having an InternalString subclass that only supports one 
>> encoding, allowing it to be subscripted with Ints? The idea is that an 
>> InternalString is for Strings that are more or less hard coded into the app. 
>> Dictionary keys, enum raw values, that kind of stuff. This also has the 
>> added benefit of forcing the programmer to think about what the String is 
>> being used for. Is it user facing? Or is it just for internal use? And of 
>> course, it makes code dealing with String manipulation much more concise and 
>> readable.
>> 
>> It follows that something like this would need to be entered as a literal to 
>> make it as easy as using String. One way would be to make all String 
>> literals InternalStrings, but that sounds far too drastic. Maybe appending 
>> an exclamation point like "this"! Or even just wrapping the whole thing in 
>> exclamation marks like !"this"! Of course, we could go old school and write 
>> it like @"this" …That last one is a joke.
>> 
>> I'll be the first to admit I'm way in over my head here, so I'm very open to 
>> suggestions and criticism. Thanks!
>> 
>> I can sympathize, but this is tricky.
>> 
>> Fundamentally, if it's going to be a learning and teaching issue, then this 
>> "easy" string should be the default. That is to say, if I write `var a = 
>> "Hello, world!"`, then `a` should be inferred to be of type InternalString 
>> or EasyString, whatever you want to call it.
>> 
>> But, we also want Swift to support Unicode by default, and we want that 
>> support to do things The Right Way(TM) by default. In other words, a user 
>> should not have to reach for a special type in order to handle arbitrary 
>> strings correctly, and I should be able to reassign `a = "你好"` and have 
>> things work as expected. So, we also can't have the "easy" string type be 
>> the default...
>> 
>> I can't think of a way to square that circle.
>> 
>> 
>> Sent from my iPad
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>> 
>> 
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
> 
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] InternalString class for easy String manipulation

Reply via email to