Gregor Schneider wrote:
And it's getting really nuts, when it comes to UTF-8: Talking about
UTF-8 with or without BOM? Even the specs are not clear about that.
Actually, a UTF-8 stream should /never/ need a BOM, because there is no
byte-order, UTF-8 being by definition byte-oriented.
The only problem is that, for instance MS-Windows Notepad adds a BOM to
any text file it saves as UTF-8. Is anyone surprised ?
Another, linked issue is this :
If you edit and save as UTF-8 an html page using, for example, Notepad,
it will always prefix the file with such a totally superfluous BOM.
If you later serve this page with Apache or Tomcat, to an Internet
Explorer browser, using no matter which HTTP Content-Type + charset
header, Internet Explorer will see the BOM and decide that this page is
encoded in UTF-8, no matter what any meta tag in the page says.
In my oppinion, the whole character-set is a pain in the ass:
I agree with that.
I personally wish IETF came up with some specs saying something like
"the first n bytes of any stream have to be encoded in ASCII containg
length and encoding-type of the rest of the stream".
I agree with that too, in general terms.
I believe that any file, any stream, should start with such a prefix,
indicating at least the file's MIME type, charset and encoding (size may
be unknown at that point), with a default of "text/plain", Unicode and
UTF-8.
I also believe there should be a HTTP 2.0 specification, specifying in
clear terms a default Unicode/UTF-8 encoding for URLs, html pages, form
data submission and so on, and a non-ambiguous way of deviating from that.
The problem is in bringing this about.
I put that on my whishlist for xmas.
That's nice, but you would have to start by convicing Santa Klaus.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org