on Wed Jun 28 2017, Ben Cohen <swift-evolution@swift.org> wrote: > Hi swift-evolution, > > Below is a short pitch for some performance improvements to be made to String > to accommodate > Substrings. > > As outlined in SE-0163, the more general question of implicit conversion from > Substring to String > was deferred pending feedback on the initial implementation. To date, the > feedback we’ve received > hasn’t suggested that such an implicit conversion is necessary – that > migrations from 3.2 to 4.0 > haven’t led to an extensive need to perform Substring->String conversion. Any > further input, either > on or off list, about this particular aspect of migration would be very > gratefully received. > > # Substring performance affordances > > * Proposal: [SE-NNNN](NNNN-substring-affordances.md) > * Authors: [Ben Cohen](https://github.com/airspeedswift > <https://github.com/airspeedswift>) > * Review Manager: TBD > * Status: **Awaiting review** > > ## Introduction > > This proposal modifies a small number of methods in the standard library that > are commonly used with the `Substring` type: > > - Modify the `init` on floating point and integer types, to construct them > from `StringProtocol` rather than `String`.
+1. This just makes good sense. > > - Change `join` to be an extension `where Element: StringProtocol` I don't see how this is implementable unless we ignore the May 10 core team resolution that RangeReplaceableCollection conformance be removed from StringProtocol, or we add RangeReplaceableCollection to the constraints on Element. We had good reasons for the resolution, so I think we'd need to do the latter. That said, I presume this is being called "joined()" per https://github.com/apple/swift/pull/10734... I'm concerned that we haven't sufficiently considered the forest of joined() overloads we're expanding. For all other sequences, joined() produces a lazy result. I realize that joining strings to produce a string is a really important operation, but String(myStrings.joined()) is not so onerous as to make it an obvious choice to add this overload to StringProtocol, especially since the other ones will still appear on String thanks to its Sequence conformance. String's conformance to Collection gets confusing when these kinds of things happen. Instead, we should be considering dropping the eager overload from String for Swift 4. Or, if we think it's important enough to avoid the explicit conversion to string, maybe we should be thinking about making the other joined()s eager. > - Add extensions to `Dictionary where Key == String` and `Set where > Element == String` to test for presence of a `Substring`. Here's why I don't think we should do this: * As noted in your introduction, we deliberately avoided taking steps that would make Substrings transparently work where Strings are now accepted so that we'd find out whether the impedance mismatch was going to be an issue. We've had very little time to get information about String/Substring interop issues that people might face and doing this now would mute the signal. * The proposal's stated motivation is to work around a performance problem, and it does so by adding a very specific, special-purpose piece of API. But the performance issues are actually more general: - Sets and dictionaries offer no way to look up a key and, when the key isn't found, use the bucket information computed by the lookup for a subsequent insertion. extension Set { func toggle(_ k: Element) { let p = s.lookup(k) // <==== for example if p.found { s.remove(at: p.index) } else { s.insert(k, at: p) } } } - Substring offers no way to create a corresponding String that avoids a copy when necessary for performance reasons: for w: Substring in words { result += transform( String(maintainingExcessStorageBySharing: w)) // <== for example } If transform is an API that—as we recommend by default—operates on String rather than Substring or StringProtocol, and is known not to make long-term copies of its argument, users currently have no way to avoid the performance cost of a copy. It's not something people should use every day, but when you need it, you need it. Together these APIs can be used to solve the Substring performance problem with Set<String>/Dictionary<String,V> insertion, and a whole raft of others besides. let p = s.lookup(String(maintainingExcessStorageBySharing: w)) if !p.found { s.insert(String(w), at: p) * The proposed API adds overloads, which harms usability, and having these particular overloads is IMO not clearly the right answer in the long term. Removing APIs later is expensive in labor, harmful to users, and sometimes difficult technically. * Handling these cases transparently sets up an expectation that other such cases should be handled transparently as well. The proposal itself raises many such possibilities, and the implications seem to be lots of API complexity creep and things that magically work in some cases but not in others. Even if we refuse to extend this pattern to other APIs, failure to meet expectations and the resulting confusion is a bad user experience. In sum, we should be quite sure of this API before we add it, and I, at least, am not. If we *do* decide to add it, we should at the same time establish some general principles that drive its addition so that we have a consistent basis on which to make the same kind of decision about other APIs. Thanks for listening, Dave > ## Motivation > > Swift 4 introduced `Substring` as the slice type for `String`. Previously, > `String` had been its own slice type, but this leads to issues where string > buffers can be unexpectedly retained. This approach was adopted instead of the > alternative of having the slicing operation make a copy. A copying slicing > operation would have negative performance consequences, and would also > conflict > with the requirement that `Collection` be sliceable in constant time. In cases > where an API requires a `String`, the user must construct a new `String` from > a > `Substring`. This can be thought of as a "deferral" of the copy that was > avoided at the time of the slice. > > There are a few places in the standard library where it is notably inefficient > to force a copy of a substring in order to use it with a string: performing > lookups in hashed containers, joining substrings, and converting substrings to > integers. In particular, these operations are likely to be used inside a loop > over a number of substrings extracted from a string. For example, suppose you > had a string of key/value pairs, where the values were integers and you wanted > to sum them by key. You would be forced to convert both the `Substring` keys > and values to `String` to do this. > > ## Proposed solution > > Add the following to the standard library: > > ```swift > extension FixedWidthInteger { > public init?<S : StringProtocol>(_ text: S, radix: Int = 10) > } > > extension Float/Double/Float80 { > public init?<S : StringProtocol>(_ text: S, radix: Int = 10) > } > > extension Sequence where Element: StringProtocol { > public func joined(separator: String = "") -> String > } > > extension Dictionary where Key == String { > public subscript(key: Substring) -> Value? { get set } > public subscript(key: Substring, default defaultValue: @autoclosure () -> > Value) -> Value { get set > } > } > > extension Set where Element == String { > public func contains(_ member: Substring) -> Bool > public func index(of member: Substring) -> Index? > public mutating func insert(_ newMember: Substring) -> (inserted: Bool, > memberAfterInsert: Element) > public mutating func remove(_ member: Substring) -> Element? > } > ``` > > These additions are deliberately narrow in scope. They are _not_ intended to > solve a general problem of being able to interchange substrings for strings > (or > more generally slices for collections) generically in different APIs. See the > alternatives considered section for more on this. > > ## Source compatibility > > No impact, these are either additive (in case of hashed containers) or > generalize an existing API to a protocol (in case of numeric > conversion/joining). > > ## Effect on ABI stability > > The hashed container changes are additive so no impact. The switch from > conrete > to generic types for the numeric conversions needs to be made before ABI > stability. > > ## Alternatives considered > > While they have a convenience benefit as well, this is not the primary goal of > these additions, but a side-effect of helping avoid a performance problem. In > many other cases, the performance issues can be avoided via modified use e.g. > `Sequence.contains` of a `Substring` in a sequence of strings can be written > as > `sequence.contains { $0 == substring }` . > > These changes are limited in scope, and further additions could be considered > in the future. For example, should the `Dictionary.init(grouping:by:) where > Key > == String` operation be enhanced to similarly take a sequence of substrings? > There is a long tail of these cases, and the need to keep unnecessary > overloads > to a minimum, avoiding typechecker work and code bloat, must be weighed > against > the likelyhood that string copies will be a performance problems. > > There is a more general problem of interoperating between collections and > slices. In the future, there may be other affordances for converting/comparing > them. For example, it might be desirable to require equatable collections to > have equatable slices, and to automatically provide default implementations of > `==` that efficiently compare a collection to its default slice. These > enhancements rely on features such as conditional conformance, and so may be > worth considering in later versions of Swift but are not an option currently. > _______________________________________________ > swift-evolution mailing list > swift-evolution@swift.org > https://lists.swift.org/mailman/listinfo/swift-evolution > -- -Dave _______________________________________________ swift-evolution mailing list swift-evolution@swift.org https://lists.swift.org/mailman/listinfo/swift-evolution