Gautam Sengupta wrote: > Is there any reason (apart from trying to be > ISCII-conformant) why the Bangla word /ki/ "what" > cannot be encoded as [KA][ZWJ][I]? Do we really need > combining forms of vowels to encode Indian scripts?
Perhaps you are right that it *would* have been a cleaner design to have only one set of vowel. But notice that <KA><+><I> is one character longer that <KA><+I>. Maybe storage space is not a big problem these days, but still it makes 2 to 4 extra bytes for each consonant not followed by the inherent vowel /a/. Perhaps it *would* have been better to have only the combining vowels, and to form independent vowels with a "mute consonant" (actually, the independent vowel "a"). > Also, why not use [CONS][ZWJ][CONS] instead of > [CONS][VIRAMA][CONS]? One could then use [VIRAMA] only > where it is explicit/visible. OK. But what happens when the font does not have a glyph for the ligature <cons><ZWJ><cons>, nor for the half consonant <cons><ZWJ>, nor for the subjoined consonant <ZWJ><cons>? As <ZWJ>, per se, is an invisible character, what happens is that your string displays as <cons><cons>, which is clearly semantically incorrect. If you want the explicit virama to be visible, you need to encode it as <cons><VIRAMA><cons>. And this means that you (the author of the text) are forced to chose between <ZWJ> and <VIRAMA> based on the availability of glyphs in the *particular* font that you are using while typing. And this is a big no no no, because it would impede you to change the font without re-typing part of the text. What happens with the current Unicode scheme is that, if the font does not have a glyph for the ligature <cons><VIRAMA><cons>, nor for the half consonant <cons><VIRAMA>, nor for the subjoined consonant <VIRAMA><cons>, the virama is *automatically* displayed visibly, so that the semantics of the text is always safe, even if rendered with the most stupid of fonts. > Surely, [A/E][ZWJ][Y][ZWJ][AA] is more "natural" and > intuitively acceptable than any encoding in which a > vowel is followed by a [VIRAMA]? Maybe. But I see no reason why being natural or intuitive should be seen as key feature for an encoding system. That might be the case for an encoding system designed to be used by humans, but Unicode is designed to be used by computers, so I don't see the problem. I assume that in a well designed Bengali input method, yaphala would be a key on its own, so, by the point of view of the user, it is just a "character": they don't need to know that when they press that key the sequence of codes <VIRAMA><YA> will actually be inserted, so they won't notice the apparent nonsense of the sequence <vowel><VIRAMA> and, as we say in Italy, "If eye doesn't see, heart doesn't hurt". _ Marco