I'm not sure which "one suggested heuristic method" you are referring to, ...Basically the one that in UTF-16 there are likely to be many zero bytes in either odd or even positions.
Not necessarily. In certain texts neither might occur at all, so the heuristic fails.... but you are bounding to conclusions. For example, one of the heuristics is to judge what are more common characters when bytes are interpreted as if they were in different encoding schemes. When picking between UTF16-BE and LE, U+0020 is *still* much more common than U+2000, even in Thai.
I agree with Mark S and others that more sophisticated methods are likely to be safer.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/