Well, I actually don’t see. I took a look at the Sinhala you inserted in this email. I cannot tell what you did at your input end (about “inserted all joiners”), but there are no actual joiners in the text itself. It displayed just fine in my email (including the correct conditional formatting of the –u vowel applied to the ra in purukee), without me doing anything special (or installing any hacked font). Why? Because it was transmitted in plain Unicode.
I cut and pasted that Unicode Sinhala string into a Word document, and it worked just fine. The boundaries for all the syllables were correctly detected. I saved it as a plain text UTF-8 file, and it worked just fine. I even then read the plain text UTF-8 file into a UTF-8 aware programming editor, and it worked just fine. (In a programming editor, which doesn’t attempt complex script rendering, the vowels don’t apply to the consonants and no reordering is done, so the display isn’t correct, but each character is correctly preserved, and if I write it back out to a document and read it in Word or some other tool that has access to proper rendering, it is still fine.) And all that interoperability works, why? Because this is plain Unicode. So while I don’t doubt that people may be having serious issues with input methods for Sinhala, I tend to agree with Marc Durdin that you are confusing encoding with input methods. Yes, I know you know the difference, but it appears to me that the inescapable conclusion from your argumentation is that the highest priority for the design of an encoding system should be to make the design of input methods as simple as possible. And in my estimation, that is confusing encoding with input methods. The art of input methods is to hide encoding details from users, and instead to provide them with an abstraction that they find easy to use and which accords with their general understanding of the writing system they are using. If done correctly, then the details of the input method *also* recede into the background, and users then simply do what they want: write and edit text easily on their devices. --Ken P.S. Here is an octal dump of that text (after I inserted a closing parenthesis in the editor). Sinhala sequence highlighted. Plain Unicode in UTF-8, no fancy stuff, and works just fine. 0000000000 EF BB BF 62 61 6C 75 20 76 61 6C 69 67 65 65 C2 0000000020 A0 75 C2 B5 61 20 70 75 72 75 6B 65 65 C2 A0 C3 0000000040 B0 61 61 6C 61 61 20 68 C3 A6 C3 B0 75 76 61 C3 0000000060 BE 20 6E C3 A6 C3 A6 20 C3 A6 C3 B0 65 65 20 C3 0000000100 A6 72 65 6E 6E 65 65 0D 0A 28 E0 B6 B6 E0 B6 BD 0000000120 E0 B7 94 20 E0 B7 80 E0 B6 BD E0 B7 92 E0 B6 9C 0000000140 E0 B7 9A 20 E0 B6 8B E0 B6 AB 20 E0 B6 B4 E0 B7 0000000160 94 E0 B6 BB E0 B7 94 E0 B6 9A E0 B7 9A 20 E0 B6 0000000200 AF E0 B7 8F E0 B6 BD E0 B7 8F 20 E0 B7 84 E0 B7 0000000220 90 E0 B6 AF E0 B7 94 E0 B7 80 E0 B6 AD E0 B7 8A 0000000240 20 E0 B6 B1 E0 B7 91 20 E0 B6 87 E0 B6 AF E0 B7 0000000260 9A 20 E0 B6 87 E0 B6 BB E0 B7 99 E0 B6 B1 E0 B7 0000000300 8A E0 B6 B1 E0 B7 9A 29 0D 0A 0D 0A As you see, this is a terrible mess and cannot be straightened, granted few people use it, and there'll be more. What other choice do they have except Anglicizing?. In Singhala, they say, "balu valigee uµa purukee ðaalaa hæðuvaþ nææ æðee ærennee" (බලු වලිගේ උණ පුරුකේ දාලා හැදුවත් නෑ ඇදේ ඇරෙන්නේ <- I inserted all joiners, but can't guarantee if vowel signs would pop out). It means you cannot straighten dog tail even if you put it in a bamboo.piece. You cannot fix Unicode Singhala and sadly, it is bringing down the language with it.
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode