> On May 27, 2017, at 10:40 AM, Dave Abrahams via swift-evolution > <swift-evolution@swift.org> wrote: > > > Pretty version: > https://github.com/dabrahams/swift-evolution/blob/string-index-overhaul/proposals/NNNN-string-index-overhaul.md > > ---- > > # String Index Overhaul > > * Proposal: [SE-NNNN](NNNN-string-index-overhaul.md) > * Authors: [Dave Abrahams](https://github.com/dabrahams) > * Review Manager: TBD > * Status: **Awaiting review** > * Pull Request Implementing This Proposal: > https://github.com/apple/swift/pull/9806 > > *During the review process, add the following fields as needed:* > > ## Introduction > > Today `String` shares an `Index` type with its `CharacterView` but not > with its `UTF8View`, `UTF16View`, or `UnicodeScalarView`. This > proposal redefines `String.UTF8View.Index`, `String.UTF16View.Index`, > and `String.CharacterView.Index` as typealiases for `String.Index`, > and exposes a public `encodedOffset` property and initializer that can > be used to serialize and deserialize positions in a `String` or > `Substring`. > > Swift-evolution thread: [Discussion thread topic for that > proposal](https://lists.swift.org/pipermail/swift-evolution/) > > ## Motivation > > The different index types are supported by a set of `Index` > initializers, which are failable whenever the source index might not > correspond to a position in the target view: > > ```swift > if let j = String.UnicodeScalarView.Index( > someUTF16Position, within: s.unicodeScalars) { > ... > } > ``` > > The current API is as follows: > > ```swift > public extension String.Index { > init?(_: String.UnicodeScalarIndex, within: String) > init?(_: String.UTF16Index, within: String) > init?(_: String.UTF8Index, within: String) > } > > public extension String.UTF16View.Index { > init?(_: String.UTF8Index, within: String.UTF16View) > init(_: String.UnicodeScalarIndex, within: String.UTF16View) > init(_: String.Index, within: String.UTF16View) > } > > public extension String.UTF8View.Index { > init?(_: String.UTF16Index, within: String.UTF8View) > init(_: String.UnicodeScalarIndex, within: String.UTF8View) > init(_: String.Index, within: String.UTF8View) > } > > public extension String.UnicodeScalarView.Index { > init?(_: String.UTF16Index, within: String.UnicodeScalarView) > init?(_: String.UTF8Index, within: String.UnicodeScalarView) > init(_: String.Index, within: String.UnicodeScalarView) > } > ``` > > These initializers are supplemented by a corresponding set of > convenience conversion methods: > > ```swift > if let j = someUTF16Position.samePosition(in: s.unicodeScalars) { > ... > } > ``` > > with the following API: > > ```swift > public extension String.Index { > func samePosition(in: String.UTF8View) -> String.UTF8View.Index > func samePosition(in: String.UTF16View) -> String.UTF16View.Index > func samePosition( > in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index > } > > public extension String.UTF16View.Index { > func samePosition(in: String) -> String.Index? > func samePosition(in: String.UTF8View) -> String.UTF8View.Index? > func samePosition( > in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index? > } > > public extension String.UTF8View.Index { > func samePosition(in: String) -> String.Index? > func samePosition(in: String.UTF16View) -> String.UTF16View.Index? > func samePosition( > in: String.UnicodeScalarView) -> String.UnicodeScalarView.Index? > } > > public extension String.UnicodeScalarView.Index { > func samePosition(in: String) -> String.Index? > func samePosition(in: String.UTF8View) -> String.UTF8View.Index > func samePosition(in: String.UTF16View) -> String.UTF16View.Index > } > ``` > > The result is a great deal of API surface area for apparently little > gain in ordinary code, that normally only interchanges indices among > views when the positions match up exactly (i.e. when the conversion is > going to succeed). Also, the resulting code is needlessly awkward. > > Finally, the opacity of these index types makes it difficult to record > `String` or `Substring` positions in files or other archival forms, > and reconstruct the original positions with respect to a deserialized > `String` or `Substring`. > > ## Proposed solution > > All `String` views will use a single index type (`String.Index`), so > that positions can be interchanged without awkward explicit > conversions: > > ```swift > let html: String = "See <a href=\"http://swift.org\">swift.org</a>" > > // Search the UTF16, instead of characters, for performance reasons: > let open = "<".utf16.first!, close = ">".utf16.first! > let tagStart = s.utf16.index(of: open) > let tagEnd = s.utf16[tagStart...].index(of: close) > > // Slice the String with the UTF-16 indices to retrieve the tag. > let tag = html[tagStart...tagEnd] > ``` > > A property and an intializer will be added to `String.Index`, exposing > the offset of the index in code units (currently only UTF-16) from the > beginning of the string: > > ```swift > let n: Int = html.endIndex.encodedOffset > let end = String.Index(encodedOffset: n) > assert(end == String.endIndex) > ``` > > # Comparison and Slicing Semantics > > When two indices being compared correspond to positions that are valid > in any single `String` view, comparison semantics are already fully > specified by the `Collection` requirements. Where no single `String` > view contains both index values, the indices compare unequal and > ordering is determined by comparison of `encodedOffsets`. These index > values are not totally ordered but do satisfy strict weak ordering > requirements, which is sufficient for algorithms such as `sort` to > exhibit sensible behavior. We might consider loosening the specified > requirements on these algorithms and on `Comparable` to support strict > weak ordering, but for now we can treat such index pairs as being > outside the domain of comparison, like any other indices from > completely distinct collections. > > An index that does not fall on an exact boundary in a given `String` > or `Substring` view will be “rounded down” to the nearest boundary > when used for slicing or range replacement. So, for example, >
What about normal subscript? I.e. what would the following print? print(s[s.unicodeScalars.indices.dropFirst().first!]) // “é”, or just the combining scalar? Would unifying under the same type require that indices be less stateful than they currently are? > ```swift > let s = "e\u{301}galite\u{301}" // "égalité" > print(s[s.unicodeScalars.indices.dropFirst().first!...]) // “égalité" > print(s[..<s.unicodeScalars.indices.last!]) // "égalit" > ``` > > Replacing the failable APIs listed [above](#motivation) that detect > whether an index represents a valid position in a given view, and > enhancement that explicitly round index positions to nearby boundaries > in a given view, are left to a later proposal. For now, we do not > propose to remove the existing index conversion APIs. > > ## Detailed design > > `String.Index` acquires an `encodedOffset` property and initializer: > > ```swift > public extension String.Index { > /// Creates a position corresponding to the given offset in a > /// `String`'s underlying (UTF-16) code units. > init(encodedOffset: Int) > > /// The position of this index expressed as an offset from the > /// beginning of the `String`'s underlying (UTF-16) code units. > var encodedOffset: Int > } > ``` > > `Index` types of `String.UTF8View`, `String.UTF16View`, and > `String.UnicodeScalarView` are replaced by `String.Index`: > > ```swift > public extension String.UTF8View { > typealias Index = String.Index > } > public extension String.UTF16View { > typealias Index = String.Index > } > public extension String.UnicodeScalarView { > typealias Index = String.Index > } > ``` > > Because the index types are collapsing, index conversion methods and > initializers are reduced to the following: > > ```swift > public extension String.Index { > init?(_: String.Index, within: String) > init?(_: String.Index, within: String.UTF8View) > init?(_: String.Index, within: String.UTF16View) > init?(_: String.Index, within: String.UnicodeScalarView) > > func samePosition(in: String) -> String.Index? > func samePosition(in: String.UTF8View) -> String.Index? > func samePosition(in: String.UTF16View) -> String.Index? > func samePosition(in: String.UnicodeScalarView) -> String.Index? > } > ``` > > ## Source compatibility > > Because of the collapse of index > types, [existing non-failable APIs](#motivation) become failable. To > avoid breaking Swift 3 code, the following overloads of existing > functions are added, allowing the resulting optional indices to be > used where previously non-optional indices were used. These overloads > were driven by making the new APIs work with existing code, including > the Swift source compatibility test suite, and should be viewed as > migration aids only, rather than additions to the Swift 3 API. > > ```swift > extension Optional where Wrapped == String.Index { > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional indices") > public static func ..<( > lhs: String.Index?, rhs: String.Index? > ) -> Range<String.Index> { > return lhs! ..< rhs! > } > > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional indices") > public static func ...( > lhs: String.Index?, rhs: String.Index? > ) -> ClosedRange<String.Index> { > return lhs! ... rhs! > } > } > > // backward compatibility for index interchange. > extension String.UTF16View { > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public func index(after i: Index?) -> Index { > return index(after: i) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public func index( > _ i: Index?, offsetBy n: IndexDistance) -> Index { > return index(i!, offsetBy: n) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional indices") > public func distance(from i: Index?, to j: Index?) -> IndexDistance { > return distance(from: i!, to: j!) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public subscript(i: Index?) -> Unicode.UTF16.CodeUnit { > return self[i!] > } > } > > extension String.UTF8View { > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public func index(after i: Index?) -> Index { > return index(after: i!) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index { > return index(i!, offsetBy: n) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional indices") > public func distance( > from i: Index?, to j: Index?) -> IndexDistance { > return distance(from: i!, to: j!) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public subscript(i: Index?) -> Unicode.UTF8.CodeUnit { > return self[i!] > } > } > > // backward compatibility for index interchange. > extension String.UnicodeScalarView { > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public func index(after i: Index?) -> Index { > return index(after: i) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public func index(_ i: Index?, offsetBy n: IndexDistance) -> Index { > return index(i!, offsetBy: n) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional indices") > public func distance(from i: Index?, to j: Index?) -> IndexDistance { > return distance(from: i!, to: j!) > } > @available( > swift, deprecated: 3.2, obsoleted: 4.0, > message: "Any String view index conversion can fail in Swift 4; please > unwrap the optional index") > public subscript(i: Index?) -> Unicode.Scalar { > return self[i!] > } > } > ``` > > - **Q**: Will existing correct Swift 3 applications stop compiling due > to this change? > > **A**: it is possible but unlikely. The existing index conversion > APIs are relatively rarely used, and the overloads listed above > handle the common cases in Swift 3 compatibility mode. > > - **Q**: Will applications still compile but produce > different behavior than they used to? > > **A**: No. > > - **Q**: Is it possible to automatically migrate from the old syntax > to the new syntax? > > **A**: Yes, although usages of these APIs may be rare enough that it > isn't worth the trouble. > > - **Q**: Can Swift applications be written in a common subset that works > both with Swift 3 and Swift 4 to aid in migration? > > **A**: Yes, the Swift 4 APIs will all be available in Swift 3 mode. > > ## Effect on ABI stability > > This proposal changes the ABI of the standard library. > > ## Effect on API resilience > > This proposal makes no changes to the resilience of any APIs. > > ## Alternatives considered > > The only alternative considered was no action. > > > -- > -Dave > > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution _______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution