Re: Detecting encoding in Plain text

jon Thu, 08 Jan 2004 07:00:21 -0800

> I writing a small tool to get text from a txt file into a edit box.
> Now this txt file could be in any encoding for eg(UTF-8,UTF-16,Mac
> Roman,Windows ANSI,Western (ISO-8859-1),JIS,Shift-JIS etc)
> My problem is that I can distinguish between UTF-8 or UTF-16 using the BOM.
> But how do I auto detect the others.
> Any kind of help will be appreciated.


There is no foolproof way of differentiating between some of the encodings. 
While UTF-16 or UTF-8 with a BOM (such files don't necessarily start with a BOM 
by the way) "stand out" as being unlikely to be in any other encoding others 
are more troublesome.

If there is no source of encoding information (such as you get with xml 
declarations, HTTP headers and such), and even if there is, it may be best to 
offer your users the ability to select encodings (perhaps with the default 
choice based on locale settings).

--
Jon Hanna
<http://www.hackcraft.net/>
*Thought provoking quote goes here*

Re: Detecting encoding in Plain text

Reply via email to