Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization
On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote: What have you learnt so far? I've learned that I've misattributed the cause of high frequency of character encoding menu usage in the case of the Traditional Chinese localization. We've been shipping after the wrong fallback encoding (UTF-8) even after the fallback encoding was supposedly fixed (to Big5). Shows what kind of a mess our previous mechanism for setting the fallback encoding in a locale-dependent way was. The fallback encoding for Traditional Chinese will change to Big5 for real in Firefox 28. I might have improved (hopefully; to be seen still) Firefox for the wrong reason. Oops. :-) Also, more baseline telemetry data (i.e. data without TLD-based guessing) is now available. The last 3 weeks of Firefox 25 on the release channel: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393 . The last 3 weeks of Firefox 26 on the release channel: https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394 . The rows for locales with such little usage overall but even a couple of sessions with the encoding menu use puts them of the list percentage-wise are grayed. In both cases, the top entries in black are Traditional Chinese and Thai, both of which have the wrong fallback. Up next are CJK followed by the Cyrillic locales that have a detector on by default (Russian and Ukrainian), which makes one wonder if the detectors are doing more harm than good. Up next is Arabic, which has the wrong fallback. (These wrong fallbacks are fixed in Firefox 28. In Firefox 28, no locale falls back to UTF-8.) -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/
Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization
On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote: The correlation should be at least as high, as far as I can tell. Logically, yes, for most parts of the world. Or maybe a 50%/50% experiment with that as the first 50% and the default coming from the TLD instead of the UI locale in the second 50%, with the corresponding instrumentation, to see how the results compare. Mozilla doesn't have a proper A/B testing infrastructure yet. I expect the A to be Firefox 29 on the release channel and B to be Firefox 30 on the release channel. So unless this gets backed out, I expect to have data around the time of Firefox 31 going to release. Have you tried deploying this? It is on Firefox trunk now. However, not all country TLDs are participating. I figured it is better to leave unsure cases the way they were. It doesn't make sense to put a lot of effort into researching those before seeing if the general approach works for the case that it was designed for, specifically Traditional Chinese. The success metric I expect to be looking at is if the usage of the character encoding menu in the Traditional Chinese localization of Firefox falls to the same level as in other Firefox localizations in general. If this change turns out to be successful for Traditional Chinese, then I think it will be worthwhile to research the unobvious cases. The TDLs listed in https://mxr.mozilla.org/mozilla-central/source/dom/encoding/nonparticipatingdomains.properties do not participate at present (i.e. get a browser UI localization-based guess like before). The TLDs listed in https://mxr.mozilla.org/mozilla-central/source/dom/encoding/domainsfallbacks.properties get the fallbacks listed in that file. All other TLDs map to windows-1252. What have you learnt so far? It hasn't been an obvious and immediate disaster. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/
Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization
On Thu, 19 Dec 2013, Henri Sivonen wrote: Considering that the encoding of the content browsed is not really a function of the UI localization of the browser, though the two are often correlated, I have developed a patch for Firefox to make the guess based on the top-level domain name of the URL of the document when possible. Before deciding whether to land that patch, I'd like to get feedback from the broader Web standards community. Does this seem like a good idea? Good idea if the mapping details are tweaked? Bad idea? (Why?) Seems like a reasonable idea to me. The correlation should be at least as high, as far as I can tell. But that's just a guess. Data would be good, for example instrumenting an existing locale-based browser to see how often the guess from the locale disagrees with the guess from the TLD, and checking how often the guess from the locale is wrong (via looking at people overriding the encoding manually). Or maybe a 50%/50% experiment with that as the first 50% and the default coming from the TLD instead of the UI locale in the second 50%, with the corresponding instrumentation, to see how the results compare. Have you tried deploying this? What have you learnt so far? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'