Re: [whatwg] Spec comments, sections 1-2

Anne van Kesteren Wed, 05 Aug 2009 05:25:37 -0700

On Wed, 05 Aug 2009 02:01:59 +0200, Ian Hickson <i...@hixie.ch> wrote:

I'm pretty sure that character encoding support in browsers is more of a
"collect them all" kind of thing than really based on content that
requires it, to be honest.

Really? I think a lot of them are actually used. If you know anything I'dlove to trim the amount of encodings the Web needs to a smaller list thanwhat we currently ship with. Ideally this becomes a fixed list across allWeb languages.

If someone can provide a firm list of encodings that they are confident
are required for a certain substantial percentage of the Web, I'm happyto add the list to the spec.

Can you not do a survey on your large dataset of data to find this out? Iread somewhere also that Adam Barth was able to add code to Google Chrometo figure out a better algorithm for Content-Type sniffing. Maybesomething similar could be done here?

We've encountered problems by the way with using the Unicode encodingmatching algorithm. Particularly on some Asian sites. I think we need toswitch HTML5 back to something more akin to WebKit/Gecko/Trident. Irealize this means more magic lists, but the current algorithm does notseem to cut it. E.g. sites rely on the fact that EUC_JP is not arecognized encoding but EUC-JP is.



--
Anne van Kesteren
http://annevankesteren.nl/

Re: [whatwg] Spec comments, sections 1-2

Reply via email to