Tim Greenwood asked: > > All of the spacing combining marks (general category Mc) except > > musical symbols have a canonical combining class of 0. So, for example > > > > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left > > of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on the right) is > > canonically distinct from 0B95 0BBE 0BC7 - even though I presume that > > they would generate an identical glyph. Why is this?
Because <0BBE, 0BC6> is not canonically equivalent to U+0BCA, which is the preferred representation for this vowel, anyway. Making Indic spacing vowel matras have non-zero combining classes, and forcing them to start reordering under normalization would have introduced even greater complications. As it is, <0BBE, 0BC6> should simply be treated as a misspelling of Tamil. And Peter Constable continued: > The question that comes to my mind isn't why some Mc marks don't have > non-zero classes Right Attached class, but rather why any Mc marks *do* > have non-zero classes. > > There are 352 marks with a canonical combining class > 0. Only 8 of > these, all musical symbols, are Mc. Because it seemed like a good idea at the time, because nobody objected, and because we are stuck with it now, inconsistent or not. [gc=Mc] ==> [ccc=0] is *not* an invariant we have ever tried to maintain in UnicodeData, by the way. Also, any musical scoring program that is actually making use of the various note flags and stems involved in this to construct rendered musical scores is going to be a *very* special case program, anyway. These particular 8 items can*NOT* be rendered correctly in context by out-of-the box generic text rendering software. --Ken

