Re: [swift-dev] State of String: ABI & Performance

Michael Ilseman via swift-dev Wed, 17 Jan 2018 12:19:02 -0800

(Replying out-of-order)

> Sadly, I don’t. I’m only an opinionated hobbyist in this domain, one who has 
> coded a lot of string processing over the years and understands at least some 
> of the tradeoffs.



Wouldn’t it be amazing to be able to write a high performance lexer with all 
the awesomeness of Swift and without String getting in your way? :-)

> ^ String should NOT be an Any!
<snip>
> Bridging is either “always eager” or “possibly lazy”, there is nothing 
> usefully in between.  Lazy bridging means that every operations down to the 
> lowly ascii parsing operations need to check for the slow path.  I understand 
> that this may not be possible to eliminate on apple platforms, but it would 
> be great to not add the existential generalization since that would affect 
> other platforms that don’t have the legacy api concern.

<snip>
> Yes, StringRef style algorithms are a big deal, as I mentioned in my previous 
> email, but it is also unclear if this will really be a win when shoehorned 
> into String.  The benefit of StringRef is that it is a completely trivial 
> type (both in the SIL sense but also in the implementation sense) and all the 
> primitive ops get inlined.  Given the “all things to all people” design of 
> String, I’m very much afraid that trying to shoehorn this into the String 
> currency type will fail to provide significant wins and thus lead to having a 
> separate StringRef style type anyway.  Providing a StringRef style projection 
> type that is trivial (in the Swift sense) that knows in its static type that 
> it never owns memory seems like the best path.
> 
> By point of comparison, C++ has std::string (yes, sure, with lots of issues) 
> but they still introduced StringRef nee std::string_view instead of wedging 
> it in.

I left the “UnmangedString” section open for debate on SE, but for the purposes 
of talking about ABI design (i.e. what is possible to expose in the future), 
let’s just say we’ll eventually have an `UnsafeString<CodeUnit>`, which is a 
pointer, length, and packed performance flags that provides both a random 
access view of its code units as well as full-Unicode operations.

Please bear with me as I try to reason through what String should be 
incorporating the current reality of source compatibility. If you have any 
design insight or historical context to fill in, please do so!

In one extreme corner of this multi-dimensional design space we have a 
proliferation of string types such as UTF8String, UTF16String, and even 
combinations of performance flags such as TrivialUTF8String, 
CanonicalSingleScalarGraphemeSmallString, etc. This proliferation encodes all 
these would-be-dynamic-branches in the type, which is great for performance. 
But, this hits APIs hard, where APIs either accept total chaos by taking 
concrete types, or try to generalize/erase over these myriad types.

On the other extreme end is AnyString, an existential that can be backed by 
anything that can vend code units in some fashion. The advantage of such a type 
is API unification: no matter what form someone has, they can always call your 
API without copying the underlying data. The downside is that everyone’s in the 
same slow boat.

Somewhere between these two extremes will be String. The question is, what 
details does String treat as dynamic properties? Source compatibility dictates 
that String abstracts away the underlying encoding, branching inside every API 
that performs a read of its code units. There will always be 
performance-critical use cases that wish to avoid these branches, i.e. branch 
at the top rather than throughout. A more appropriate type is needed for this, 
such as UTF16String, UTF8String, and UnsafeString<CodeUnit>. Treating encoding 
as a dynamic property is great for APIs, as it avoids an encoding schism.

If we do small string optimizations for the String type, then every String 
could either be small (a value) or large (a reference). Every use of large 
strings will be slightly penalized. To a very small extent, even performance 
flags slightly penalizes strings that doesn’t set them. These are worth doing 
nonetheless as the benefit is so large when they apply and the penalty is 
paltry relative to cost of the operations in this slow path.

(...continued after quote)

> Right.  It is actually a bit more complicated than that though.  Consider the 
> Swift 4 generation of strings and arrays: because of lazy bridging each 
> primitive operation includes the dynamic branching sequence.  Beyond the 
> dynamic performance impact of this, we end up having a really unfortunate set 
> of tradeoffs to choose between:
> 
> 1. We can leave the operations outlined in the standard library.  
> Particularly for Array which benefits from being specialized on its element 
> type, this is a huge perf loss.
> 
> 2. We can inline the operation, but we end up inlining both sides of the 
> operation, causing massive code bloat.
> 
> 3. We can partially inline just the presumed fastpath, and leave the slow 
> path out of line.
> 
> The problem with this when you bring it to string is that the “fastpath” 
> would naturally be the small string case.  If you make *it* have many levels 
> of branching underneath it, not only do you slow down the fast case, but you 
> also cause intense code bloat if you want to inline core operations into user 
> code.
> 
<snip>
>> The “high-order bit” for performance is whether a string is small-or-not, as 
>> avoiding an allocation and, just as importantly, managing the memory with 
>> ARC is skipped. Beyond that, we’re debating smaller performance deltas, 
>> which are still important. This reasoning also influences where we choose to 
>> stick our resilience barriers, which can be relaxed in the future.
> 
> You’re making an argument about aggregate use cases on average, but it is 
> unwise to ignore the common performance critical cases that do actually want 
> to look at the bytes. 

That is a very good point, and it brings up some of the tradeoffs we need to 
balance in a data-driven fashion. This reminds me of the classic throughput vs 
latency view of performance. We have many, many strings on our systems and 
anything that reduces overall resource usage is extremely valuable. On the 
other hand, some operations need to be as fast as possible and shouldn’t be 
heavily penalized for some aggregate win.

String with small string optimizations have “value” forms and “reference” 
forms, from a runtime and ABI perspective. Evaluating adding more value forms 
involves answering these questions:

1. What is the potential gain for accommodating this additional value form? 
This is probably a “throughput” gain.
2. What is the cost to the fast path, and to the slow path, of accommodating 
this additional value form? That is, how does this hurt “latency”?

Let’s say “fast path” means the small UTF-8 form, and “slow path” is the 
out-of-line encoding-abstracted form. The cost to the fast path could be 
reduced if we recognize that one particular value form (e.g. small UTF-8 
string) is worth checking first and all other value forms can be handled by a 
call to some non-inlined function. For the fast path, it’s the difference 
between checking if the top bit is 1 vs the top n bits being 1, which is a 
couple extra instructions (on x86_64 at least).

The cost to the slow path is checking and branching for other small forms. This 
may be significant or insignificant, depending on the operation. E.g. 
meaningless compared to general grapheme breaking, but could be significant for 
`endIndex`.

As to String being able to accommodate “Any”-like usage, it would not impact 
the fast path (not a value form). It would have a similar impact on the slow 
path as multiple value forms, and would need similar justification.

> Consider the http header parsing challenges the swift on server workgroup 
> faced that led them to use a C implementation, for example.

That’s very unfortunate. Highly-performance-sensitive use cases have largely 
been neglected by String, and it’s very important to consider their needs in 
the ABI. Do you know of a retrospective or post-mortem of the attempt?

HTTP header parsing probably wouldn’t want to use String, which dynamically 
branches on reads to accommodate both UTF-8 and UTF-16. Similarly, the headers 
wouldn’t be small, so no need for that extra branch around inlined code there 
either. A “UTF8String" type would be more appropriate. They further might be 
known-ASCII, which would imply trivially-encoded-and-canonical UTF8String 
(maybe even single-scalar-grapheme if they’re known not to have Windows-style 
newlines), so no need to check-and-branch—just execute the fast path. For 
parsed portions, they probably want UnsafeString<UInt8>, or perhaps some 
variant that encoded perf flags too.

I think the future is bright if we get the right ABI.

> 
>>>  Given the ability to hold 15 digits of ascii, I don’t see why it would be 
>>> worthwhile to worry about a 5-bit representation for digits.  String should 
>>> not be an Any!
> 
> I still don’t understand why this would even be considered, do you have any 
> more rationale to explain that?
> 
<snip>
>> As to a 5-bit encoding, again, all details pending more experimentation. Its 
>> importance may be diminished by more efficient, low-level String 
>> construction interfaces. The difference between 15 and 24 characters could 
>> be big, considering that formatted addresses and Doubles fit in that range. 
>> We’ll see.
> 
> I still can’t imagine any workload that would care about this, and cannot see 
> why punishing the normal cases would be worth any cost.  Can you?

I do not at this time have detailed rationale. FWIW, my plan is to roll out the 
small UTF-8 representation first and only add the others if there’s sufficient 
data and justification (i.e. cost/benefit analysis). I’m especially interested 
in whether more efficient ways to construct and interpolate Strings would 
address the perceived need for this form.

>>>> Final details are still in exploration. If the construction and 
>>>> interpretation of small bit patterns can remain behind a resilience 
>>>> barrier, new forms could be added in the future. However, the performance 
>>>> impact of this would need to be carefully evaluated.
>>> 
>>> I’d love to see performance numbers on this.  Have you considered a design 
>>> where you have exactly one small string representation: a sequence of 15 
>>> UTF8 bytes?  This holds your 15 bytes of ascii, probably more non-ascii 
>>> characters on average than 7 UTF16 codepoints, and only needs one 
>>> determinator branch on the entry point of hot functions that want to touch 
>>> the bytes.  If you have a lot of algorithms that are sped up by knowing 
>>> they have ascii, you could go with two bits:  “is small” and “isascii”, 
>>> where isascii is set in both the small string and normal string cases.
>>> 
>> 
>> We have not yet done the investigation, so all details could change. This is 
>> my (perhaps flawed!) reasoning as to why, in general, it may be useful to 
>> have multiple small forms. Again, this will change as we learn more.
>> 
>> Strings are created and copied (i.e. a value-copy/retain) more often than 
>> they are read (and strings are read far more often than they are modified). 
> 
> If you believe it is highly biased towards this, then this strongly argues 
> for a single word representation...
> 

… depending on how far along the diminishing returns curve 7 vs 15 UTF-8 code 
units is. The killer feature of single-word String is that e.g. Array<String> 
is denser and since growth happens via move/take operations, grow operations 
are also quicker. However, an Array copy-from-a-write is subject to the 
tradeoff of small vs large. For Strings as arguments / return values, the win 
for single-word String is conditional on how many other arguments / return 
values there are and our calling convention (which IIRC has devoted many 
registers to arguments / return values). There’s also the existential inline 
buffer size, which I think is worth spinning off into a separate discussion.

>> 4. The second word can cache useful information from large strings. 
>> `endIndex` is a very frequently requested computed property and it could be 
>> stored directly in-line rather than loaded from memory (though perhaps a 
>> load happens anyways in a subsequent read of the string). Alternatively, we 
>> could store the grapheme count or some other piece of information that we’d 
>> otherwise have to recompute. More experimentation needed here.
> 
> This seems weakly motivated: large strings can store end index in the heap 
> allocation.
> 

Right. I was hoping that large strings can use the extra bits to recover some 
of the (albeit small) regression from accommodating a small form, and endIndex 
has a higher relative hit. But, it would probably be better to use them for 
something else and not try to “anchor” ourselves to large-string micro 
benchmarks.

>> 5. (vague and hand-wavy) Two-words fits into a nice groove that 3-words 
>> doesn’t: 2 words is a rule-of-thumb size for very small buffers. It’s a 
>> common heap alignment, stack alignment, vector-width, double-word-load 
>> width, etc.. 1-word Strings may be under-utilizing available resources, that 
>> is the second word will often be there for use anyways. The main case where 
>> this is not true and 1-word shines is aggregates of String.
> 
> What is the expected existential inline buffer size going to wind up being?  
> We sized it to 3 words specifically to fit string and array.  It would be 
> great to shrink that to 2 or 1 words.
> 

(Bob replied to this).

Another angle is that Substring is currently String + a 2-word range. We think 
it should be shrunk to String + a 1-word index-mapping offset. 2-word String 
would let Substring fit in the existing inline buffer. The question is whether 
the inline buffer should shrink with String, or if there are cases such as 
Substring that are still important.

>>> - array has exactly the same sort of concern, why is this a string-specific 
>>> problem?
>> 
>> I think it’s interesting as slices may emerge more often in practice on 
>> strings than arrays, e.g. for presenting parsing results. Also, applications 
>> that care about this level of performance may custom-allocate all their 
>> string data contiguously (e.g. llvm::BumpPtr). I believe that projects such 
>> as llbuild have done this, forcing themselves into an API schism between 
>> String and UnsafeBufferPointer<UInt8>, often duplicating functionality. In 
>> theory all of this could also apply to array, but it seems to happen more 
>> for strings.
> 
> I see String and Array as directly analogous in the space.  The *only* 
> difference IMO is when you get to algorithms that interpret the bytes because 
> of frickin’ unicode.
<snip>
> Yes, I think that your point about Unsafe is important, which is even more 
> reason not to wedge it into String.  I really do understand the tradeoff of 
> how you can’t pass a “Ref” to something that takes a String, but we already 
> extensively debated that topic and accepted it w.r.t. Slice types.  This 
> seems directly analogous.
> 

The analogy between String vs Substring and Array vs Slice is interesting and 
makes sense. I’ll have to think about this a bit more, and whether String vs 
Substring might differ in practice.

>>>> * Unification of String and Substring’s various Views.
>>>> * Some form of opaque string, such as an existential, which can be used as 
>>>> an extension point.
>>>> * More compact/performant indices.
>>> 
>>> What is the current theory on eager vs lazy bridging with Objective-C?  Can 
>>> we get to an eager bridging model?  That alone would dramatically simplify 
>>> and speed up every operation on a Swift string.
>>> 
>> 
>> I don’t think that will be feasible in general, although perhaps it could 
>> happen for most common occurrences.
> 
> 
>>> 
>>>   func hexToInt(_ c : UInt8) -> UInt8 {
>>>     switch c {
>>>     case charToByte("0")...charToByte("9"): return c - charToByte("0")
>>>     case charToByte("a")...charToByte("f"): return c - charToByte("a") + 10
>>>     case charToByte("A")...charToByte("F"): return c - charToByte("A") + 10
>>>     default: fatalError("invalid hexadecimal character")
>>>     }
>>>   }
>>> 
>> 
>> Yeah, trying to do ASCII value ranges is currently pretty rough. Here’s what 
>> I did recently when I wanted something similar, using unicode scalar literal 
>> ranges:
> 
> We resisted adding single quoted strings for characters early on because we 
> felt that it was a marginal case and might want to use them for additional 
> quoting characters (like scripting languages like Perl that allow either 
> single or double quotes) or for multi-line strings.  However, now that 
> multiline strings are settled and now that it is clear that we’re not going 
> to add multiple redundant ways to quote strings, I think it is worth 
> considering whether we should introduce ExpressibleByCharacterLiteral types 
> and whether UInt8 should be one (along with Character of course) and tie it 
> to single quotes.
> 
> -Chris
> 
> 
> 
>

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] State of String: ABI & Performance

Reply via email to