Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

2014-02-26 Thread Henri Sivonen
On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote:
 What have you learnt so far?

I've learned that I've misattributed the cause of high frequency of
character encoding menu usage in the case of the Traditional Chinese
localization. We've been shipping after the wrong fallback encoding
(UTF-8) even after the fallback encoding was supposedly fixed (to
Big5). Shows what kind of a mess our previous mechanism for setting
the fallback encoding in a locale-dependent way was. The fallback
encoding for Traditional Chinese will change to Big5 for real in
Firefox 28.

I might have improved (hopefully; to be seen still) Firefox for the
wrong reason. Oops. :-)

Also, more baseline telemetry data (i.e. data without TLD-based
guessing) is now available. The last 3 weeks of Firefox 25 on the
release channel:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381393 . The
last 3 weeks of Firefox 26 on the release channel:
https://bug965707.bugzilla.mozilla.org/attachment.cgi?id=8381394 . The
rows for locales with such little usage overall but even a couple of
sessions with the encoding menu use puts them of the list
percentage-wise are grayed. In both cases, the top entries in black
are Traditional Chinese and Thai, both of which have the wrong
fallback. Up next are CJK followed by the Cyrillic locales that have a
detector on by default (Russian and Ukrainian), which makes one wonder
if the detectors are doing more harm than good. Up next is Arabic,
which has the wrong fallback. (These wrong fallbacks are fixed in
Firefox 28. In Firefox 28, no locale falls back to UTF-8.)

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/


Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

2014-02-08 Thread Henri Sivonen
On Sat, Feb 8, 2014 at 12:37 AM, Ian Hickson i...@hixie.ch wrote:
 The correlation should be at least as high, as far as I can tell.

Logically, yes, for most parts of the world.

 Or maybe a 50%/50% experiment
 with that as the first 50% and the default coming from the TLD instead of
 the UI locale in the second 50%, with the corresponding instrumentation,
 to see how the results compare.

Mozilla doesn't have a proper A/B testing infrastructure yet. I expect
the A to be Firefox 29 on the release channel and B to be Firefox 30
on the release channel. So unless this gets backed out, I expect to
have data around the time of Firefox 31 going to release.

 Have you tried deploying this?

It is on Firefox trunk now. However,  not all country TLDs  are
participating. I figured it is better to leave unsure cases the way
they were. It doesn't make sense to put a lot of effort into
researching those before seeing if the general approach works for the
case that it was designed for, specifically Traditional Chinese. The
success metric I expect to be looking at is if the usage of the
character encoding menu in the Traditional Chinese localization of
Firefox falls to the same level as in other Firefox localizations in
general.

If this change turns out to be successful for Traditional Chinese,
then I think  it will be worthwhile to research the unobvious cases.

The TDLs listed in
https://mxr.mozilla.org/mozilla-central/source/dom/encoding/nonparticipatingdomains.properties
do not participate at present (i.e. get a browser UI
localization-based guess like before). The TLDs listed in
https://mxr.mozilla.org/mozilla-central/source/dom/encoding/domainsfallbacks.properties
get the fallbacks listed in that file. All other TLDs map to
windows-1252.

 What have you learnt so far?

It hasn't been an obvious and immediate disaster.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/


Re: [whatwg] Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

2014-02-07 Thread Ian Hickson
On Thu, 19 Dec 2013, Henri Sivonen wrote:
 
 Considering that the encoding of the content browsed is not really a 
 function of the UI localization of the browser, though the two are often 
 correlated, I have developed a patch for Firefox to make the guess based 
 on the top-level domain name of the URL of the document when possible.
 
 Before deciding whether to land that patch, I'd like to get feedback 
 from the broader Web standards community.
 
 Does this seem like a good idea? Good idea if the mapping details are 
 tweaked? Bad idea? (Why?)

Seems like a reasonable idea to me. The correlation should be at least as 
high, as far as I can tell. But that's just a guess. Data would be good, 
for example instrumenting an existing locale-based browser to see how 
often the guess from the locale disagrees with the guess from the TLD, and 
checking how often the guess from the locale is wrong (via looking at 
people overriding the encoding manually). Or maybe a 50%/50% experiment 
with that as the first 50% and the default coming from the TLD instead of 
the UI locale in the second 50%, with the corresponding instrumentation, 
to see how the results compare.

Have you tried deploying this? What have you learnt so far?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'