On Mon, 21 May 2012 17:43:27 -0700 Ken Whistler <[email protected]> wrote:
> > For example, when caseFirst is set to > > uppercase, ICU orders U+1D34 MODIFIER LETTER CAPITAL H before > > U+0068 LATIN SMALL LETTER H, but anomalously order U+A7F8 MODIFIER > > LETTER CAPITAL H WITH STROKE*after* U+0127 LATIN SMALL LETTER H > > WITH STROKE becaue the latter's tertiary weight identifies it > > as<super> with no entry for 'Case or kana subtype' class. Is this > > behaviour required by the UCA + DUCET? > Well, that may be a bug in allkeys.txt. But, given allkeys.txt as it is, is it required behaviour? You sound quite unenthusiastic about fixing the arguable bug in allkeys.txt. > The default > tertiary weights aren't completely separated into all the possible > combinations > here, because the required weighting space gets out of hand, and seems > unnecessary for the edge cases for compatibility characters, at least > for *default* weighting of such. It's still a bit of a shock to find 13 characters with 3-level weights [.1699.0020.0005], all potentially used in the same 'language' - Mathematics. It's not just characters with compatibility decompositions. U+A669 CYRILLIC SMALL LETTER MONOCULAR O, U+A66B CYRILLIC SMALL LETTER BINOCULAR O, U+A66D CYRILLIC SMALL LETTER DOUBLE MONOCULAR O and U+A66E CYRILLIC LETTER MULTIOCULAR O, which do not have decompositions, are all sharing the <compat> tertiary weight. > If even in > *those* circumstances, somebody required uppercase-first tailoring > to work without exception for U+A7F8, well, then the solution for > that is simply to tailor the default tertiary weight from 0014 to > 001D. How would one do that through LDML? The obvious hack if one is committed to uppercase-first is "&\u0126<<<\ua7f8", but that doesn't work in the ICU demonstrator. (It maroons U+A7F8 amongst the tertiary variants of plain 'h'.) I think this is related to a known problem area relating to contractions and expansions, but the LDML documentation leaves me mystified rather than explaining what to do. Richard.

