RE: Fun with proof by analogy, was Re: Mojibake on my Web pages

Jill Ramonsky Mon, 29 Sep 2003 08:50:08 -0700

I don't see anything wrong with the spec. So far as I can see it is doing the right thing. Although the behaviour of the described server could be better.

First point - if no information is present, assume "us-ascii". Sounds extremely sensible to me. ASCII is the intersection of Latin-1, UTF-8, and various other commonly used encodings. Moreover, in order to even read the name of the encoding, the name of the encoding must have itself been encoded in something. It makes sense to me to assume the absolute minimum. If you want more than the minimum, declare your encoding. This should not be a problem.

Second point - the "search order" - (1) server; (2) XML tag; (3) HTML meta tag. This also makes sense to me. Yes, the document author should know best, but it is the server, not the client, which should take notice of the meta tag.

As far as the browser is concerned, meta tags in the document must not override the headers, as this could result in security holes exploitable by attackers.

The issue is slightly more complicated. The browser must believe the HTTP headers. However, if the meta tags and HTTP headers are in conflict then I believe the server is at fault, in not making the correct declaration. In other words, if the document author says (in a meta tag) "this is in UTF-8", then the server should (in my opinion) send the document to the browser with an encoding type of UTF-8. In other words, the server should (again, in my opinion), ensure that the HTTP header is not in conflict with a meta tag, by changing the HTTP header to match the meta tag. However, if a server does not do this, still, then the browser must believe the HTTP header.

Jill

> -----Original Message-----
> From: John Cowan [mailto:[EMAIL PROTECTED]]
> Sent: Saturday, September 27, 2003 3:48 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Fun with proof by analogy, was Re: Mojibake on
> my Web pages
>
>
> [EMAIL PROTECTED] scripsit:
>
> > First, the browser checks the HTTP header, then the XML declaration
> > (which is not relevant to HTML), then the HTML meta tag.
> >
> > Apparently, upon finding character set information, the operation
> > stops, so if information is present in the HTTP header, the meta
> > tag won't be consulted.
>
> It's worse than that. If the HTTP header says "text/xml" or
> "text/html",
> and no charset information is provided, a fully conforming browser
> MUST treat this as if the charset "us-ascii" is specified. That's
> just insane, but such are the rules.
>
> Only if there is no header, or if the header says "application/xml",
> do we get to proceed to other sources of knowledge.
>
> > All of the data should be consulted and there should be some kind
> > of protocol in place to handle conflicting character set info.
>
> It *is* in place and fully specified. It's just that most of us
> don't care for the results, and most programs don't fully conform
> for that reason.
>
> --
> Some people open all the Windows;       John Cowan
> wise wives welcome the spring           [EMAIL PROTECTED]
> by moving the Unix.                     http://www.reutershealth.com
>   --ad for Unix Book Units (U.K.)       http://www.ccil.org/~cowan
>         (see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)
>

RE: Fun with proof by analogy, was Re: Mojibake on my Web pages

Reply via email to