Very nice improvements overall!

> To ease the pain of type mismatches, Substring should be a subtype of String 
> in the same way that Int is a subtype of Optional<Int>. This would give users 
> an implicit conversion from Substring to String, as well as the usual 
> implicit conversions such as [Substring] to [String] that other subtype 
> relationships receive.

As others have said, it would be nice for this to be more general. Perhaps we 
can have a special type or protocol, something like RecursiveSlice?

> A Substring passed where String is expected will be implicitly copied. When 
> compared to the “same type, copied storage” model, we have effectively 
> deferred the cost of copying from the point where a substring is created 
> until it must be converted to Stringfor use with an API.

Could noescape parameters/new memory model with borrowing make this more 
general? Again it seems very useful for all kinds of Collections.
> The “Empty Subscript”
> 
Empty subscript seems weird. IMO, it’s because of the asymmetry between 
subscripts and computed properties. I would favour a model which unifies 
computed properties and subscripts (e.g. computed properties could return 
“addressors” for in-place mutation).
Maybe this could be an “entireCollection”/“entireSlice" computed property?


> The goal is that Unicode exposes the underlying encoding and code units in 
> such a way that for types with a known representation (e.g. a 
> high-performance UTF8String) that information can be known at compile-time 
> and can be used to generate a single path, while still allowing types like 
> String that admit multiple representations to use runtime queries and 
> branches to fast path specializations.

Typo: “unicodeScalars" is in the protocol twice.

If I understand it, CodeUnits is the thing which should always be defined by 
conformers to Unicode, and UnicodeScalars and ExtendedASCII could have default 
implementations (for example, UTF8String/UTF16String/3rd party conformers will 
use those), and String might decide to return its native buffer (e.g. if 
Encoding.CodeUnit == UnicodeScalar).

I’m just wondering how difficult it would be for a 3rd-party type to conform to 
Unicode. If you’re developing a text editor, for example, it’s possible that 
you may need to implement your own String-like type with some optimised storage 
model and it would be nice to be able to use generic algorithms with them. I’m 
thinking that you will have some kind of backing buffer, and you will want to 
expose regions of that to clients as Strings so that they can render them for 
UI or search through them, etc, without introducing a copy just for the 
semantic understanding that this data region contains some text content.

I’ll need to examine the generic String idea more, but it’s certainly very 
interesting...

> Indexes


One thing which I think it critical is the ability to advance an index by a 
given number of codeUnits. I was writing some code which interfaced with the 
Cocoa NSTextStorage class, tagging parts of a string that a user was editing. 
If this was an Array, when the user inserts some elements before your stored 
indexes, those indexes become invalid but you can easily advance by the 
difference to efficiently have your indexes pointing to the same characters.

Currently, that’s impossible with String. If the user inserts a string at a 
given index, your old indexes may not even point to the start of a grapheme 
cluster any more, and advancing the index is needlessly costly. For example:

var characters = "This is a test".characters
assert(characters.count == 14)

// Store an index to something.
let endBeforePrepending = characters.endIndex

// Insert some characters somewhere.
let insertedCharacters = "[PREPENDED]".characters
assert(insertedCharacters.count == 11)
characters.replaceSubrange(characters.startIndex..<characters.startIndex, with: 
insertedCharacters)

// This isn’t really correct.
let endAfterPrepending = characters.index(endBeforePrepending, offsetBy: 
insertedCharacters.count)
assert(endAfterPrepending == characters.endIndex) // Fails Anyway. 24 != 25


The manifesto is correct to emphasise machine processing of Strings, but it 
should also ensure that machine processing of mutable Strings is efficient. 
That way we can tag backing-Strings inside user-interface components and 
maintain those indices in a unicode-safe way.

The way to solve this would be that, when replacing or removing a portion of a 
String, you learn how many CodeUnits in the receiver’s encoding were 
inserted/removed so you can shift your indexes accordingly.



_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Reply via email to