The Tibetan Unicode block contains a number of characters (U+0F73, U+0F75, 
U+0F81) that have a canonical combining class value of zero, and have non-empty 
decomposition mappings. This is not out of the ordinary, but upon inspecting 
the code points that they map to, I found that the canonical combining class of 
each decomposition code point is greater than zero.

In the case of U+0F81, the decomposition mapping is: U+0F71 U+0F80. Both U+0F71 
and U+0F80 have canonical combining class values greater than zero, so U+0F81 
decomposes solely into combining marks, yet has a canonical combining class 
value of zero.

What is the reasoning behind this discrepancy? It is my understanding that 
U+0F81 (TIBETAN VOWEL SIGN REVERSED II, ཱྀ) is supposed to be a combining mark. 
Also, the Tibetan block is the only block that contains code points with this 
behavior. It is likely that I'm misunderstanding the semantics of the canonical 
combining class system.


Diego Frias

Reply via email to