- Original Message -
From: Peter Kirk [EMAIL PROTECTED]
To: John Cowan [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Friday, September 26, 2003 11:52 PM
Subject: Re: Fun with proof by analogy, was Re: Mojibake on my Web pages
On 26/09/2003 06:16, John Cowan wrote:
Peter Kirk
John Cowan wrote:
It's worse than that. If the HTTP header says text/xml or
text/html,
and no charset information is provided, a fully conforming browser
MUST treat this as if the charset us-ascii is specified.
Nit: this is not the case for text/html, which fortunately took exception
from
James Kass wrote:
In the event of a conflict between the HTTP header and the HTML meta
tag, of course the browser should believe the HTML meta tag. After
all, who knows better than the author the encoding used to construct
the file?
Who knows better the encoding used to *send* the file? The
With respect to the issues we raised in
http://www.unicode.org/consortium/utc-positions.html, the IAB has taken the
following positions:
http://www.iab.org/documents/correspondance/2003-09-25-iso-cs-code.html
http://www.iab.org/documents/correspondance/2003-09-23-isocodes.html
FYI from NSAI to the 3166 Maintenance Agency and to TC46 Secretariat.
=
Ireland does not support the recent decision of the ISO 3166
Maintenance Agency, reassiging CS (formerly Czechoslovakia) to
Serbia and Montenegro. It is
I don't see anything wrong with the spec. So far as I can see it is
doing the right thing. Although the behaviour of the described server
could be better.
First point - if no information is present, assume "us-ascii". Sounds extremely
sensible to me. ASCII is the intersection of Latin-1,
On 29/09/2003 07:27, Francois Yergeau wrote:
...
It takes large amounts of tricky code to reliably parse real-life HTML. It
is unreasonable to expect servers, which have no business parsing HTML, to
contain this code. ...
Agreed. But if they don't parse the HTML they don't know what the
On 29/09/2003 08:01, Jill Ramonsky wrote:
...
As far as the browser is concerned, meta tags in the document _/must
not/_ override the headers, as this could result in security holes
exploitable by attackers.
The issue is slightly more complicated. The browser /must/ believe the
HTTP headers.
Agreed. But if they don't parse the HTML they don't know what the
content of the document is and so they have no business to mess around
with that content by re-encoding it.
There is no re-encoding! There just might be is all.
There might also be a lot of other things going on, and hence a
Jill Ramonsky wrote:
First point - if no information is present, assume us-ascii.
Sounds extremely sensible to me.
Sounds very misguided to me.
ASCII is the intersection of Latin-1, UTF-8, and various other
commonly used encodings.
How does that make it more likely that guessing ASCII would
François --
You might be interested to know that all of your recent mail has the
following header attached to it! Sounds to me like your outgoing server is
tagging mail, and it's getting things wrong.
Rick
X-Spam-Report: This mail is probably spam. The original message has been
11 matches
Mail list logo