RE: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

Doug Ewell Mon, 28 Jun 2010 13:37:47 -0700

Mark Crispin <mrc plus unicode at panda dot com> wrote:

> On Mon, 28 Jun 2010, Mark Davis ☕ wrote:
>> The problem with slavishly following the charset parameter is that it
>> is often incorrect. However, the charset parameter is a signal into
>> the character detection module, so the charset is correctly supplied
>> from the message then the results of the detection will be weighted
>> that direction.
> 
> I interpret these two sentences as:
> 
> "The problem with following the standards is that some people don't
> follow the standards.  So instead of following the standards
> ourselves, we will guess if the other guy follows the standards or
> not, no matter how much he claims to follow standards.  Too bad if our
> fix transforms his valid data into garbage."


At the very least, it would be nice if the charset parameter constituted
a much stronger signal into the detection module than it apparently did
in Andreas' case, so that if he says the text is 8859-15, and we already
know that 8859-15 is nearly impossible to distinguish heuristically from
8859-1, the module might as well take his word for it.

I do tend to agree with Mark that the complaint against Google Groups
(with which I am not affiliated) might have been posted with more
civility and less invective.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

RE: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

Reply via email to