Chris Fynn wrote: > In Unicode's UnicodeData.txt ( > http://www.unicode.org/Public/UNIDATA/Unicodea.Dattxt ) > 0F7E has a Canonical Combining Class Value (CCCV) of 0; > 0F71 a CCCV of 129; > 0F72 0F7A 0F7B 0F7C 0F7D and 0F80 a CCCV of 130; > 0F74 a CCCV of 132; > and 0F82 and 0F83 have a CCCV of 230. > > By normal Tibetan & Dzongkha spelling, writing, and input rules > Tibetan script stacks should be entered and written: 1 headline > consonant (0F40-0F6A), any subjoined consonant(s) (0F90- > 0F9C), achung (0F71), shabkyu (0F74), any above headline > vowel(s) (0F72 0F7A 0F7B 0F7C 0F7D and 0F80) ; any ngaro (0F7E, > 0F82 and 0F83) > > So following normal Tibetan & Dzongkha input and spelling rules > the relative ordering of these characters should be: > A. 0F71 > B. 0F74 > C. 0F72 0F7A 0F7B 0F7C 0F7D and 0F80 > D. 0F7E, 0F82 and 0F83 > > The fact that, in a process of "canonical decomposition" or > "normalisation", these combining characters can get reordered > in a bizarre order relative to each other
Actually, looking at this data, while I can see that the combining classes are assigned less than optimally, I don't see that this makes any practical problem for Tibetan data. You are saying, in effect, that the stack structure has the following position classes (treating the consonant stack itself as the more tightly bound unit that I will just symbolize as CS): CS - achung - shabkyu - vowelsabove - ngaro And since shabkyu has cc=132 whereas the vowelsabove have cc=130, they would reorder out of expected order if normalized. However, for most text the shabkyu (u-below) would be in complementary distribution with the vowels above, so the effective positional classes are: { vowelsabove } CS - achung - { shabkyu } - ngaro And in this case, the relative combining class of the vowels doesn't really matter, since we wouldn't be seeing both present to reorder around each other. I'm guessing that you are claiming there are instances where the shabkyu does cooccur with other vowels above as well. Wouldn't those, if they do occur, represent a distinctly minority case in terms of the overall processing? The short summaries of Tibetan writing that I've seen don't even mention it as a possibility, since even the few diphthongs in -u are written with a separate stack <0F60, 0F74> to the right of the main stack. > causes difficulties > with culturally correct collation (where 0F7E, 0F82 and 0F83 > should have an equal value) - and especially it necessitates > making lookups in smart fonts far more complex and inefficient > than they should have to be. And I'm not seeing the problem here, either. Since the combining class of 0F82 is 0, and not some other random value, it isn't going to reorder around the other vowel marks. If it is entered in the traditional spelling order you have indicated, then it is going to stay in that position; normalization won't move it. And since the equivalent 0F82 and 0F83 sift to the end of the syllable, with their high combining class, they'll end up in the same position as the 0F7E ngaro if normalized. The only problem you'd have is with Tibetan data where a 0F7E ngaro is entered in other than the optimal spelling order you indicated. Such a sequence won't compare equal unless you add a spelling equivalence rule on top of the canonical equivalence. But there are a number of such edge cases for Brahmic scripts -- not just Tibetan. Culturally correct collation is first a matter of giving the three ngaro characters equivalent weights. Beyond that, as you indicated, the weighting of the syllables (or stacks) is complicated, and isn't going to be affected by 0F7E having combining class 0 in any case. > > (In Tibetan script fonts 0F71 and 0F74 are often ligated with > preceding consonant (+ subjoined consonants) combined as a > single glyph whereas above headline vowels are almost always > treated as non spacing combining marks.) Yes, but the only point where this would be a problem would be for stacks with a shabkyu (u vowel) *and* another vowel. And even for such cases, wouldn't this be handled effectively by 6 triples in the ligature tables which would identify any shabkyu moved after one of the other 6 vowels? > > Currently there seems to be no easy or standardized work around > for these problems and the standard seems to say that the > relative values of assigned Canonical Combining Class Values > cannot be changed. They cannot. > Any suggestions as to how to create a standardized work around > for these incorrect values? I guess I'm not getting it. I don't see the need for a "standardized" work around, here. --Ken > > - Chris