Andy, the ya-phalaa is a presentation form of cojoined YA, which is produced in Unicode by the sequence VIRAMA + YA. Encoding it as anything else makes very little sense at all. However it is pronounced today in Bengali, and however weird you feel about its being applied to an initial vowel, the fact is that it is still a presentation form of cojoined YA, and it should be encoded as such.

Consider the fact that the Bhagavadgita is available in Sanskrit in Bengali script. This will certainly contain many, many examples of consonant clusters in -YA. These will all be encoded as VIRAMA + YA, not as some independent form of ya-phalaa.

It is easy to point fingers about a mismatch that someone like me makes, but the Unicode encoding model for Indic scripts is very robust, and we do our best to apply it correctly.

Your proposed combining ya-phalaa will do Bengali no service, as it will introduce multiple spellings for consonant clusters in -YA. I have already stated on this forum:

"For example, in Sanskrit and Bengali, we have the word pratyeka 'each, every'. This is derived from the Sanskrit root prati (expressing likeness or comparison) plus eka 'one'. In Sanskrit orthography i + e becomes ye and is so written. Now in Bengali this word also exists and in both languages what is written is PA + VIRAMA + RA + TA + VIRAMA + YA + E + KA."

It would be absurd -- and wrong -- to spell the Sanskrit word one way and the Bengali word another, especially as it is the same word.

IMHO, TUS needs solid rules; Exceptions, hacks, patches, or workarounds
should definitely be avoided wherever possible. (If you care to look
back in the mailing list archives a few years, you will see that the
"a+Virama+Ya+aa" kludge was originally proposed as a workaround due to
the lack of a separate encoded letter)

It isn't a kludge. It is a consistent application of the rules. Ya-phalaa is a presentation form of YA in conjunction with a preceding consonant or -- a Bengali innovation -- an independent vowel.


In keeping this stance, Andy, I am defending the Unicode Standards encoding principles. The Indic encoding model is constantly under attack from people who want explicit rephas, explicit half-forms, explicit ya-phalaas, and all sorts of other explicit things, which were we to encode them would make the standard very much worse than it is.

To reiterate our consistency in using this model, I will give you a Malayalam example.

NA + VIRAMA + MA --> NMA (a single conjunct)
NA + VIRAMA + ZWNJ + MA --> NMA (with a visible virama breve above and between)
NA + VIRAMA + ZWJ + MA --> NMA (with the cillaks.aram virama curl)

We prefer to apply this consistency to Bengali as well. Thank you for correcting my error earlier. That kind of feedback is helpful. Beating us up because you don't like our encoding model isn't.
--
Michael Everson * * Everson Typography * * http://www.evertype.com




Reply via email to