On Darwin, known-ASCII strings are sorted according to the lexicographical 
ordering of their code units. All non-known-ASCII strings are otherwise ordered 
based on the UCA[1]. On Linux, however, even known-ASCII strings are ordered 
based on UCA. I propose to unify these by changing Linux’s string sort order to 
match Darwin’s in Swift 4.0.

Background

Swift’s default ordering for strings is appropriate for machine consumption 
(e.g. implementing sorted collections). It obeys Unicode canonical 
equivalence[2], that is strings compare the same modulo normalization. However, 
it is not meant to be sufficient for presenting a meaningful ordering to human 
consumers, as that requires incorporating reader-specific information (e.g. 
[3]). 

Known-ASCII strings are a trivial case for the described sort order semantics: 
pure ASCII is unaffected by normalization. Thus, lexicographical ordering of 
code units is a valid machine ordering for ASCII strings. On Darwin, this is 
used to order known-ASCII strings while Linux uses UCA even for known-ASCII 
strings.

Long term, the plan is to switch String’s sort order to be the lexicographical 
ordering of normalized code units (or perhaps scalar values), as mentioned in 
the String Manifesto[4]. This is a more efficient ordering than that provided 
by UCA. However, this will not make it in time for Swift 4.0. 

Changes

I propose to change Linux’s sort order for known-ASCII strings to be the same 
as it is on Darwin. This will be accomplished by dropping the relevant #if 
guards in StringCompare.swift. An example implementation can be found at [5].

In addition to unifying sort order semantics across platforms, this will also 
deliver significant performance boosts to pure ASCII strings on Linux.

[1] UTS #10: Unicode Collation Algorithm <http://unicode.org/reports/tr10/>
[2] Canonical Equivalence in Applications <http://unicode.org/notes/tn5/>
[3] UCA: Contextual Sensitivity 
<http://unicode.org/reports/tr10/#Contextual_Sensitivity>
[4] String Manifesto: Comparing and Hashing Strings 
<https://github.com/apple/swift/blob/master/docs/StringManifesto.md#comparing-and-hashing-strings>
[5] Unifying Linux/Darwin ASCII sort order semantics - github 
<https://github.com/milseman/swift/commit/5560e13198d5cc284f46bf190f59a2edf7ed747b>
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to