On Thu, 19 Sep 2013 10:42:43 +0200 Philippe Verdy <verd...@wanadoo.fr> wrote:
> So **within the UCS**, the Thai script is not an Indic script. There > was so many existing documents encoded like in TIS sctandards that > preserving the roundtrop compatibility was judged more essential than > adopting the logical Indic order for this script. This has > consequences for some algorithms, notably for collation. As I understand it, 'logical Indic order' is the order that makes collation straightforward. Thai is one of the few major Indic script languages for which the Unicode Collation Algorithm (I don't just mean with the DUCET or CLDR default) readily delivers correct results. Thai collation is actually very computer friendly, collating <SARA E, SO SUA, LO LING, SARA AA> the same whether it is graphically one syllable /sa lǎw/ 'beautiful' or two /sěː laː/ 'hill'. By contrast, CLDR currently despairs of sorting Hindi correctly, and resorts to brute force for Burmese. As I had it explained to me in this forum, 'logical order' for Thai would have been achieved by swapping the 'logic order exception' vowels with the following consonant. *<KO KAI, SARA O, RO RUA, THO THONG> for โกรธ /kròːt/ 'angry' isn't what most people would think of as 'logical order'. Where Thai differs from most Indic scripts is that there is no conjoining mechanism, and is ambiguous as a result. This is a language property rather than a script property. However, even Pali in the 'traditional' orthography, which uses U+0E3A THAI CHARACTER PHINTHU as a visible virama, would not be simple to convert to a 'logical order', for syllable division as evidenced by the placement of the preposed vowels is not simple and is often erratic. Richard.