In the proposed update of UTS#10 (UCA), subject to the PRI #203 just posted, I note the following addition in section 3.3.2 (Contractions).
"Characters of a contraction can be made to sort as separate characters with the insertion of any starter character. There are two characters, soft hyphen and U+034F COMBINING GRAPHEME JOINER that are particularly useful for this purpose. These can be used to separate contractions that would normally be weighted as units, such as Slovak ch or Danish aa. For more information, see Section 5.3 Use of Combining Grapheme Joiner." However, in a past discussion here in this Unicode miling list, we discussed heavily about the fact that SOFT HYPHEN would not be appropriate to avoid contractions as it would also imply a break opportunity, which may be undesirable when the only intended thing was to prohibit things like contractions (for collation), or even ligatures. Note also there's another collation-ignorable character for that, ZWNJ, which does not imply a break opportunity. But it is not clear if it implies that this also avoids the contraction, for collation purpose. What do you think about this added paragraph, is it complete enough, shouldn't there other uses exhibited ? Note that contractions are normally not part of the DUCET, only part of advanced tailorings for specific languages or even just orthographies for specific dialects or bibliographic conventions. But given that the CLDR does not use directly the DUCET in its "root" locale, but also tailors it a bit (using a few expansions), so that the pure DUCET-only collation (with only the default weights, no contractions, no expansions) requires also some tailoring rules compared to the collation implied in the "root" locale, this may affect the CLDR "root" locale as well, which could in fine define some contractions by default (I just wonder how such contraction found in an inherited locale can be undone in a subsequent tailoring rule for a sublocale). Another interesting question is: how can we encode in texts the fact that a character usually considered as a ligature in a language (that collates it as separate letters, even if the ligature is orthographic and not just typographic), should still be collated as only one letter ? In other words, are there some controls (or variant selection, or other means) which would have the effect of disabling the default expansions performed in a correctly tailored collation (for example, in a French collator is there a way to disable the expansion of occurences of "æ" into "ae" ? Final note : when a complex specification document is modified within an existing section, but nothing is changed in the section titles, the TOC does not emphasize the fact that a section has been modified. We still need to look for the whole text to see which sections have been modified. In this proposed PRI, if we just look at the TOC, we could think that only one section was added. Shouldn't there be some editorial symbol (or additional annotations such as "(modified)") to designate explicitly in the TOC which sections contain modifications, to immediately go to the section that we are interested, or see morz easily if there are corellations or dependencies between those modifications ? -- Philippe.

