Hello Leif,

On 2012/07/17 23:11, Leif Halvard Silli wrote:
"Martin J. Dürst", Tue, 17 Jul 2012 18:49:47 +0900:
On 2012/07/17 17:22, Leif Halvard Silli wrote:

And an argument was put forward in the WHATWG mailinglist
earlier tis year/end of previous year, that a page with strict ASCII
characters inside could still contain character entities/references for
characters outside ASCII.

Of course they can. That's the whole point of using numeric character
references. I'm rather surprised that this was even discussed in the
context of HTML5.

And the question was whether such a page should default to be seen as
UTF-8 encoded.

If I understand correctly, whether it's "seen as UTF-8 encoded" would be irrelevant when displaying the page, but might be relevant e.g. for form submission and the like?


I have discovered one browser where it does hurt more directly: In W3M,
the text browser, which is also included in Emacs. W3M doesn't handle
(all) entities. E.g. it renderså andå as an 'aa' instead
of as an 'å', for instance.

In a followup mail, you write:

> To quote one W3m slogan: 'Its 8-bit support is second to none'. W3m is
> a quite modern text browser. It is regularly updated, it can be used
> with emacs, and is the text browser I would recommend.

If W3M is updated so regularly, why isn't the å/å -> 'aa' bug simply fixed?


So it seems to me that it is always advantageous to type characters
directly as doing so allows for better character encoding detection in
case the encoding labels disappear (read: easier to pick up that the
page is UTF-8 encoded) and also works better in at least one browser.
It does, as well, make authors more aware of the entire encoding issue
since it means that the page has to be properly labeled in order to
work cross parsers.

I agree that it general, characters should be encoded directly. There may be exceptions such as  , where in some editing environments, it's very helpful to see them explicitly.

But a bug in a minor (or even a major) browser shouldn't be the reason for avoiding character entities and numeric character references. The best reason is simply that nobody should be using crutches as long as they can walk with their own legs.

Regards,    Martin.

Reply via email to