Adam Roach a écrit :
when you look at that document, tell me what you think the parenthetical
phrase after the author's name is supposed to look like -- because I can
guarantee that Firefox isn't doing the right thing here.
In my case it does and displays : Хизер Фланаган
I have the universal c
And then you get sites that send ISO-8859-1 but the server is configured
to send UTF-8 in the headers, e.g.
http://darwinawards.com/darwin/darwin1999-38.html
--
Warning: May contain traces of nuts.
___
dev-platform mailing list
dev-platform@lists.mozi
On 9/9/13 02:31, Henri Sivonen wrote:
We don't have telemetry for the question "How often are pages that are not
labeled as UTF-8, UTF-16 or anything that maps to their replacement
encoding according to the Encoding Standard and that contain non-ASCII
bytes in fact valid UTF-8?" How rare would th
On Fri, Sep 6, 2013 at 6:17 PM, Adam Roach wrote:
> Sure. It's a much trickier problem (and, in any case, the UI is
> necessarily more intrusive than what I'm suggesting). There's no good way
> to explain the nuanced implications of security decisions in a way that is
> both accessible to a lay u
On 06/09/13 18:28, Boris Zbarsky wrote:
On 9/6/13 1:11 PM, Neil Harris wrote:
Presumably most of that XHTML is being generated by automated tools
Presumably most of that "XHTML" are tag-soup pages which claim to have
an XHTML doctype. The chance of them actually being valid XHTML is
slim to
On Friday, September 6, 2013 at 5:36 PM, Neil Harris wrote:
> On 06/09/13 16:34, Gervase Markham wrote:
> >
> > Data! Sounds like a plan.
> >
> > Or we could ask our friends at Google or some other search engine to run
> > a version of our detector over their index and see how often it says
>
On 06/09/13 17:48, Marcos Caceres wrote:
On Friday, September 6, 2013 at 5:36 PM, Neil Harris wrote:
On 06/09/13 16:34, Gervase Markham wrote:
Data! Sounds like a plan.
Or we could ask our friends at Google or some other search engine to run
a version of our detector over their index and se
Henri Sivonen schrieb:
Considering what Aryeh said earlier in this thread, do you have a
suggestion how to do that so that
> [...]
Hmm, do we have to treat the whole document as a consistent charset?
Could we instead, if we don't know the charset, look at every
rendered-as-text node/attribute
On 9/6/13 1:11 PM, Neil Harris wrote:
Presumably most of that XHTML is being generated by automated tools
Presumably most of that "XHTML" are tag-soup pages which claim to have
an XHTML doctype. The chance of them actually being valid XHTML is slim
to none (though maybe higher than the chanc
On 06/09/13 16:34, Gervase Markham wrote:
Data! Sounds like a plan.
Or we could ask our friends at Google or some other search engine to run
a version of our detector over their index and see how often it says
"UTF-8" when our normal algorithm would say something else.
Gerv
This website has an
On 06/09/13 16:45, Robert Kaiser wrote:
Henri Sivonen schrieb:
Considering what Aryeh said earlier in this thread, do you have a
suggestion how to do that so that
> [...]
Hmm, do we have to treat the whole document as a consistent charset?
Could we instead, if we don't know the charset, look
On 9/6/13 04:25, Henri Sivonen wrote:
We do surface such UI for https deployment errors
inspiring academic papers about how bad it is that users are exposed
to such UI.
Sure. It's a much trickier problem (and, in any case, the UI is
necessarily more intrusive than what I'm suggesting). There'
On 06/09/13 16:17, Adam Roach wrote:
> To the first point: the increase in complexity is fairly minimal for a
> substantial gain in usability. Absent hard statistics, I suspect we will
> disagree about how "fringe" this particular exception is. Suffice it to
> say that I have personally encountered
On Thu, Sep 5, 2013 at 7:32 PM, Mike Hoye wrote:
> On 2013-09-05 10:10 AM, Henri Sivonen wrote:
>>
>> It's worth noting that for other classes of authoring errors (except for
>> errors in https deployment) we don't give the user the tools to remedy
>> authoring errors.
>
> Firefox silently remedie
On 9/5/13 11:15 AM, Adam Roach wrote:
I would argue that we do, to some degree, already do this for things
like Content-Encoding. For example, if a website attempts to send
gzip-encoded bodies without a Content-Encoding header, we don't simply
display the compressed body as if it were encoded acc
Zack Weinberg schrieb:
It is possible to distinguish UTF-8 from most legacy
encodings heuristically with high reliability, and I'd like to suggest
that we ought to do so, independent of locale.
I would very much agree with doing that. UTF-8 is what is being
suggested everywhere as the encoding
On 2013-09-05 10:10 AM, Henri Sivonen wrote:
It's worth noting that for other classes of authoring errors (except
for errors in https deployment) we don't give the user the tools to
remedy authoring errors.
Firefox silently remedies all kinds authoring errors.
- mhoye
On 9/5/13 09:10, Henri Sivonen wrote:
Why should we surface this class of authoring error to the UI in a way
that asks the user to make a decision considering how rare this class
of authoring error is?
It's not a matter of the user judging the rarity of the condition; it's
the user being abl
On Fri, Aug 30, 2013 at 6:17 PM, Adam Roach wrote:
>
> It seems to me that there's an important balance here between (a) letting
> developers discover their configuration error and (b) allowing users to
> render misconfigured content without specialized knowledge.
It's worth noting that for oth
On 9/2/13 13:36, Joshua Cranmer 🐧 wrote:
I don't think there *is* a sane approach that satisfies everybody.
Either you break "UTF8-just-works-everywhere", you break legacy
content, you make parsing take inordinate times...
I want to push on this last point a bit. Using a straightforward UTF-8
On 8/30/2013 1:41 PM, Anne van Kesteren wrote:
On Fri, Aug 30, 2013 at 7:33 PM, Joshua Cranmer 🐧 wrote:
The problem I have with this approach is that it assumes that the page is
authored by someone who definitively knows the charset, which is not a
scenario which universally holds. Suppose you
On Fri, Aug 30, 2013 at 8:36 PM, Adam Roach wrote:
> On 8/30/13 13:41, Anne van Kesteren wrote:
>> Where did the text file come from? There's a source somewhere... And
>> these days that's hardly how people create content anyway.
>
> Maybe not for the content _you_ consume, but the Internet is a b
Mike Hoye wrote:
On 2013-08-30 3:17 PM, Adam Roach wrote:
On 8/30/13 14:11, Adam Roach wrote:
...helping the user understand why the headline they're trying to
read renders as "Ð' Ð"оÑ?дÑfме пÑEURедложили
оÑ,обÑEURаÑ,ÑOE "Ð?обелÑ?" Ñf Ðz(бамÑ< " rather than "?
?
On 8/30/13 13:41, Anne van Kesteren wrote:
Where did the text file come from? There's a source somewhere... And
these days that's hardly how people create content anyway.
Maybe not for the content _you_ consume, but the Internet is a bit
larger than our ivory tower.
Check out, for example:
On 2013-08-30 3:17 PM, Adam Roach wrote:
On 8/30/13 14:11, Adam Roach wrote:
...helping the user understand why the headline they're trying to
read renders as "Ð' Ð"оÑ?дÑfме пÑEURедложили
оÑ,обÑEURаÑ,ÑOE "Ð?обелÑ?" Ñf Ðz(бамÑ< " rather than "?
??? ?? ???
On 8/30/13 12:24, Mike Hoye wrote:
On 2013-08-30 11:17 AM, Adam Roach wrote:
It seems to me that there's an important balance here between (a)
letting developers discover their configuration error and (b)
allowing users to render misconfigured content without specialized
knowledge.
For what
On 8/30/13 14:11, Adam Roach wrote:
...helping the user understand why the headline they're trying to read
renders as "Ð' Ð"оÑ?дÑfме пÑEURедложили
оÑ,обÑEURаÑ,ÑOE "Ð?обелÑ?" Ñf Ðz(бамÑ< " rather than "?
??? ?? "??" ? ?".
Well, *there's* a h
On Fri, Aug 30, 2013 at 7:33 PM, Joshua Cranmer 🐧 wrote:
> The problem I have with this approach is that it assumes that the page is
> authored by someone who definitively knows the charset, which is not a
> scenario which universally holds. Suppose you have a page that serves up the
> contents of
On 8/30/2013 4:01 AM, Anne van Kesteren wrote:
On Fri, Aug 30, 2013 at 9:40 AM, Gervase Markham wrote:
We don't want people to try and move to UTF-8, but move back because
they haven't figured out how (or are technically unable) to label it
correctly and "it comes out all wrong".
You also don'
On Fri, Aug 30, 2013 at 6:31 PM, Chris Peterson wrote:
> Is there a less error-prone default we can recommend to Linux distribution
> packagers? Maybe we can squelch the problem upstream instead of adding
> browser hacks. The number of web server and distro packagers we would need
> to reach out t
On 8/30/13 3:03 AM, Henri Sivonen wrote:
Telemetry data suggests that these days the more common reason for
seeing mojibake is that there is an encoding declaration but it is
wrong. My guess is that this arises from Linux distributions silently
changing their Apache defaults to send a charset pa
On 2013-08-30 11:17 AM, Adam Roach wrote:
It seems to me that there's an important balance here between (a)
letting developers discover their configuration error and (b) allowing
users to render misconfigured content without specialized knowledge.
For what it's worth Internet Explorer handled
On 8/30/13 05:08, Nicholas Nethercote wrote:
On Fri, Aug 30, 2013 at 8:03 PM, Henri Sivonen wrote:
I think we should encourage Web authors to use UTF-8 *and* to *declare* it.
I'm no expert on this stuff, but Henri's point sure sound sensible to me.
It seems to me that there's an important
On Fri, Aug 30, 2013 at 4:31 PM, Aryeh Gregor wrote:
> In particular, you need to decide on the encoding before you start
> running any user script, because you don't want document.characterSet
> etc. to change once it might have already been accessed. For
> performance reasons, we want to be abl
On Fri, Aug 30, 2013 at 1:03 PM, Henri Sivonen wrote:
> This is true if you run the heuristic over the entire byte stream.
> Unfortunately, since we support incremental loading of HTML (and will
> have to continue to do so), we don't have the entire byte stream
> available at the time when we nee
On Fri, Aug 30, 2013 at 8:03 PM, Henri Sivonen wrote:
>
> I think we should encourage Web authors to use UTF-8 *and* to *declare* it.
I'm no expert on this stuff, but Henri's point sure sound sensible to me.
Nick
___
dev-platform mailing list
dev-plat
On Thu, Aug 29, 2013 at 9:41 PM, Zack Weinberg wrote:
> All the discussion of fallback character encodings has reminded me of an
> issue I've been meaning to bring up for some time: As a user of the en-US
> localization, nowadays the overwhelmingly most common situation where I see
> mojibake is w
On Fri, Aug 30, 2013 at 9:40 AM, Gervase Markham wrote:
> We don't want people to try and move to UTF-8, but move back because
> they haven't figured out how (or are technically unable) to label it
> correctly and "it comes out all wrong".
You also don't want it to be wrong half of the time. Give
On 29/08/13 19:41, Zack Weinberg wrote:
> All the discussion of fallback character encodings has reminded me of an
> issue I've been meaning to bring up for some time: As a user of the
> en-US localization, nowadays the overwhelmingly most common situation
> where I see mojibake is when a site puts
On Thu, Aug 29, 2013 at 7:41 PM, Zack Weinberg wrote:
> If people are concerned about "infecting" the modern platform with
> heuristics, perhaps we could limit application of the heuristic to quirks
> mode, for HTML delivered over HTTP. I expect this would cover the majority
> of the sites describ
All the discussion of fallback character encodings has reminded me of an
issue I've been meaning to bring up for some time: As a user of the
en-US localization, nowadays the overwhelmingly most common situation
where I see mojibake is when a site puts UTF-8 in its pages without
declaring any en
41 matches
Mail list logo