Peter Kirk writes:
I agree that heuristics should be adjusted for Thai. But problems may arise if they have to be adjusted individually, and without regression errors, for all 6000+ world languages.
Thai is hard because of the writing system. But most writing systems weren't
encoded pre-Unicode, so if they were typed into a computer, it was with
a Latin (or Cyrillic?) transliteration that probably used spaces and new lines,
and in fact was probably ASCII.
More cynically, those who use obscure character sets or font encodings have trouble viewing them; that is one of the reasons for Unicode. That this tool may to some extent be an example of that problem is a simple fact of life, and doesn't call for it to be thrown out.
Either you are confused or I am. I was not referring to pre-Unicode legacy encodings. I was referring to Unicode plain text data which may (when Unicode includes all the necessary characters) be in any one of 6000+ languages, some of which have a variety of scripts and spelling conventions. The problem is not that people are using obscure legacy encodings, but that they are not defining their UTF adequately.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

