Re: [swift-evolution] [swift-evolution-announce] [Revised and review extended] SE-0180 - String Index Overhaul

Drew Crawford via swift-evolution Tue, 27 Jun 2017 10:59:09 -0700


On June 26, 2017 at 5:43:42 PM, Karl Wagner via swift-evolution 
(swift-evolution@swift.org) wrote:

I would support a definition of encodedOffset that removed mention of UTF-16 
and phrased things in terms of String.Encoding and code-units. For example, I 
would like to be able to construct new String indices from a known index plus a 
quantity of code-units known to represent a sequence of characters:

var stringOne = “Hello,“
let stringTwo = “ world"

var idx = stringOne.endIndex
stringOne.append(contentsOf: stringTwo)
idx = String.Index(encodedOffset: idx.encodedOffset + stringTwo.codeUnits.count)
assert(idx == stringOne.endIndex)


I second this concern.  We currently use a non-Foundation library that prefers 
UTF8 encoding, I think UTF8-backed strings are important.

The choice of UTF16 as string storage in Swift makes historical sense (e.g. 
runtime interop with ObjC-backed strings) but as Swift moves forward it makes 
less sense.  We need a string system that behaves more like a lightweight 
accessor for the underlying storage (e.g. if you like your input's encoding you 
can keep it) unless you do something (like peruse a view) that requires 
promotion to a new format.  That's a different proposal, but that's the 
direction I'd like to see us head.

This proposal is in many ways the opposite of that, it specifies that we 
standardize on UTF16, and in particular we have in view the problem of file 
archiving (where we would have long-term unarchival guarantees) that complicate 
backing this out later.  This feels like a kludge to support Foundation.  In 
the archive context the offset should either be "whatever the string is" (which 
you would have to know anyway to archive/unarchive that string) or a 
full-fledged offset type that specifies the encoding such as

let i = String.Index (
    encoding: .utf16
    offset: 36
)

the latter of which would be used to port an Index between string 
representations if that's a useful feature.

More broadly though, I disagree with the motivation of the proposal, 
specifically

The result is a great deal of API surface area for apparently little gain in 
ordinary code

In ordinary code, we work with a single string representation (e.g. in Cocoa 
it's UTF16), and there is a correspondence between our UTF16 offset and our 
UTF16 string such that index lookups will succeed.  When we collapse indexes, 
we lose the information to make this correspondence, which were previously 
encoded into the typesystem.  So the "gain in ordinary code" is that 
programmers do not have to sprinkle `!` in the common case of string index 
lookups because we can infer at compile time from the type correspondence it is 
unnecessary.

Under this proposal, they will have to sprinkle the `!`, which adds friction 
and performance impact (`!` is a runtime check, and UTF16 promotions are 
expensive).  I don't believe the simplicity of implementing archival (which one 
has to only write once) is worth the hassle of complicating all string index 
lookups.

Does this proposal fit well with the feel and direction of Swift?

To me, one of Swift's greatest strengths is the type system.  We can encode 
information into the type system and find our bugs at compile time instead of 
runtime.

Here, we are proposing to partially erase a type because it's annoying to write 
code that deals with string encodings.  But our code will deal with string 
encodings somehow whether `utf16` appears in our sourcecode or not.

When we erase the type of our offset, we lose a powerful tool to prove the 
correctness of our string encodings, that is, the compiler can check our utf16 
offset is used with a utf16 string.  Without that tool, we either have to check 
that dynamically, or, worst case, there are bugs in our program.

Under this proposal we would encourage the use of a bare-integer offsets for 
string lookup.  That does not seem Swifty to me.  A Swifty solution would be to 
add a dynamically-checked type-erased String.Index alongside the existing 
statically-checked fully-typed String.UTF8/16View.Index so that the programmer 
can choose the abstraction with the performance/simplicity behavior appropriate 
for their problem.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] [swift-evolution-announce] [Revised and review extended] SE-0180 - String Index Overhaul

Reply via email to