Peter Jacobi <peter underscore jacobi at gmx dot net> wrote: > Now I'm wondering about Tamil LLA (U+0BB3) and > Tamil AU Length Mark (U+0BD7). They not only have > incidental equal shapes in the Font used for preparing > the Unicode charts, they are also indistinguishable in > handwritten Tamil, typewriter Tamil etc, I am told. > > So for all purposes: > > U+0B95 U+0BCC which is canonically equivalent to > U+0B95 U+0BC7 U+0BD7 > > looks exactly the same as > > U+0B95 U+0BC7 U+0BB3
These examples actually should use U+0BC6, not U+0BC7. But this doesn't detract from Peter's point. > Isn't that a bit odd? Not as odd as it may seem. These two characters do look the same, and in the days before computer processing of Tamil there may have been no need to distinguish between them (similar to older typewriters where lowercase L was used for digit 1). But modern processing needs tip the balance in favor of separate encoding. This is not unheard of in Unicode. In the Runic alphabet, U+16BD RUNIC LETTER SHORT-TWIG-HAGALL H and U+16C2 RUNIC LETTER E have identical glyphs, as well as identical properties. But H and E are clearly not the same letter, and were not used in the same Runic tradition, so they are not unified. > Giving an analogy using Latin script, > that would be the same as if Latin y U+0079 > in vocalic and consonantic use were > mapped to two different Unicode > codepoints. Not really. First, "they" are never considered to be two separate letters that happen to look the same, unlike the Tamil and Runic examples. In English and Spanish at least, "y" is well understood to have both a vocalic and consonantal role, but it is still a single letter that happens to wear two different hats. Second, disunifying "y" would cause untold mapping nightmares. And third, I don't know about you, but the line between vocalic "y" and consonantal "y" isn't clear enough for me to know when to use one character and when the other. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/