Re: Tomcat 5 and UTF-8

André Warnier Fri, 03 Apr 2009 02:15:35 -0700

Gregor Schneider wrote:


And it's getting really nuts, when it comes to UTF-8: Talking about
UTF-8 with or without BOM? Even the specs are not clear about that.

Actually, a UTF-8 stream should /never/ need a BOM, because there is nobyte-order, UTF-8 being by definition byte-oriented.The only problem is that, for instance MS-Windows Notepad adds a BOM toany text file it saves as UTF-8. Is anyone surprised ?


Another, linked issue is this :

If you edit and save as UTF-8 an html page using, for example, Notepad,it will always prefix the file with such a totally superfluous BOM.If you later serve this page with Apache or Tomcat, to an InternetExplorer browser, using no matter which HTTP Content-Type + charsetheader, Internet Explorer will see the BOM and decide that this page isencoded in UTF-8, no matter what any meta tag in the page says.

In my oppinion, the whole character-set is a pain in the ass:

I agree with that.


I personally wish IETF came up with some specs saying something like
"the first n bytes of any stream have to be encoded in ASCII containg
length and encoding-type of the rest of the stream".

I agree with that too, in general terms.

I believe that any file, any stream, should start with such a prefix,indicating at least the file's MIME type, charset and encoding (size maybe unknown at that point), with a default of "text/plain", Unicode andUTF-8.I also believe there should be a HTTP 2.0 specification, specifying inclear terms a default Unicode/UTF-8 encoding for URLs, html pages, formdata submission and so on, and a non-ambiguous way of deviating from that.


The problem is in bringing this about.


I put that on my whishlist for xmas.

That's nice, but you would have to start by convicing Santa Klaus.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Tomcat 5 and UTF-8

Reply via email to