RE: Question on Canonical equivilance

Kenneth Whistler Wed, 24 Nov 2004 12:09:52 -0800

Tim Greenwood asked:

> > All of the spacing combining marks (general category Mc) except
> > musical symbols have a canonical combining class of 0. So, for example
> > 
> > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left
> > of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on the right) is
> > canonically distinct from 0B95 0BBE 0BC7 - even though I presume that
> > they would generate an identical glyph. Why is this?


Because <0BBE, 0BC6> is not canonically equivalent to U+0BCA, which
is the preferred representation for this vowel, anyway.

Making Indic spacing vowel matras have non-zero combining classes,
and forcing them to start reordering under normalization would
have introduced even greater complications. As it is, <0BBE, 0BC6>
should simply be treated as a misspelling of Tamil.

And Peter Constable continued:

> The question that comes to my mind isn't why some Mc marks don't have
> non-zero classes Right Attached class, but rather why any Mc marks *do*
> have non-zero classes.
> 
> There are 352 marks with a canonical combining class > 0. Only 8 of
> these, all musical symbols, are Mc.

Because it seemed like a good idea at the time, because nobody
objected, and because we are stuck with it now, inconsistent 
or not. [gc=Mc] ==> [ccc=0] is *not* an invariant we have ever
tried to maintain in UnicodeData, by the way.

Also, any musical scoring program that is actually making use
of the various note flags and stems involved in this to construct
rendered musical scores is going to be a *very* special case
program, anyway. These particular 8 items can*NOT* be rendered
correctly in context by out-of-the box generic text rendering
software.

--Ken

RE: Question on Canonical equivilance

Reply via email to