Collation (for French) normally uses backwards ordering of collation weights at level 2:
« 4.3 Form Sort Key Step 3. The sort key is formed by successively appending all non-zero weights from the collation element array. The weights are appended from each level in turn, from 1 to 3. (Backwards weights are inserted in reverse order.) » However I think that this creates over-long sequences which would reverse ALL secondary weights of arbitrarily long texts. Not only this rule would have a severe performance impact, but this is actually not needed for French. What is needed is JUST to reverse the collation weights associated to single words (or compaound words, including those including an apostrophe). So the reversal should only apply to separate spans of text after word-breaking (see UAX #29). For example, with the sentence «Pour être heureux, ne vivons pas cachés ! », it's much enough to reverse the secondary weights like in this sentence : <span>Pour </span> <span>être </span> <span>heureux, </span> <span>ne </span> <span>vivons </span> <span>pas </span> <span>cachés !</span> Using a (UAX#10) word-breaking step (based on "extended grapheme clusters" as above, or on shorter "legacy grapheme clusters" where spaces, punctutations and spacing marks would be separated, should be used at end of steps 4.1 before step 4.2 of the UCA algorithm. And step 4.3 need just to be applied between those word-breaks, instead of on the complete string. And then, this will correctly sort an itemized list of definitions like: * être (en anglais, “to be”) : v. aux. irrégulier du 3è groupe * été (en anglais, “summer”) : n. m. – 2è saison de l’année. * ... Or other simpler lists of person names, toponyms, book titles... because it would actually apply the reversal of accent differences only within the first word of each item (other words would still be treated only if two items have the same initial word. Note that the punctuations and spaces that may cause a word-break to be detected, will often be ignored on the 2 first levels of collations (i.e. they would have a 0000 collation weight at these levels), notably in collations tailored for specific locales (such as French) and not the generic locale-neutral collation (in the "root" locale of CLDR and using the DUCET). Can the UTS#10 (currently in review) about the UCA algorithm speak about where a word breaker may be used ? This would also offer huge optimization opportunities for computing collation weights in most languages (not just French). Notably because it will reduce a lot the internal buffering needed to create each substring of collation weight for each separate collation level. And it would be useful to reserve in the DUCET a specific collation weight, at the primary level (with a lower value than the value of the collation-level separator, if it is used), or a range of such weights, that could be used for word separation (or other kinds of hierarchical logical separation) could really speedup the process of computing collation weights for long sentences (notably, it would allow collation strings to be appended directly on the fly by separating them with this separator weight). And my opinion is that, by default, at least the most basic word-breaker (on breakable whitespaces including explicit linebreak controls, possibly on sentences breaks if available) should be used to limit the effect of backwards reordering of collation weights at any level, in any practical implementation of the UCA (and notably in implementations of UCA with the French locale, in database engines for building their index and for supporting the « ORDER BY » clause and text compare operators like >, <, >=, <=, and « BETWEEN...AND », and aggregates line « MIN() » and « MAX() », and operators based on text similarity such as =, !=, and « LIKE »). Philippe.