Mark Crispin <mrc plus unicode at panda dot com> wrote: > On Mon, 28 Jun 2010, Mark Davis ☕ wrote: >> The problem with slavishly following the charset parameter is that it >> is often incorrect. However, the charset parameter is a signal into >> the character detection module, so the charset is correctly supplied >> from the message then the results of the detection will be weighted >> that direction. > > I interpret these two sentences as: > > "The problem with following the standards is that some people don't > follow the standards. So instead of following the standards > ourselves, we will guess if the other guy follows the standards or > not, no matter how much he claims to follow standards. Too bad if our > fix transforms his valid data into garbage."
At the very least, it would be nice if the charset parameter constituted a much stronger signal into the detection module than it apparently did in Andreas' case, so that if he says the text is 8859-15, and we already know that 8859-15 is nearly impossible to distinguish heuristically from 8859-1, the module might as well take his word for it. I do tend to agree with Mark that the complaint against Google Groups (with which I am not affiliated) might have been posted with more civility and less invective. -- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s