--As of July 7, 2014 5:20:01 PM -0400, Kevin A. McGrail is alleged to have said:

On 7/7/2014 5:09 PM, Philip Prindeville wrote:
On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

On 7/7/2014 2:28 AM, John Wilcock wrote:
Le 05/07/2014 19:08, Philip Prindeville a écrit :
As for encoding a cyrillic small a: there are many ways to do this.
iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this
would be very efficient—there are just too many charsets possible.
Normalising the input message to UTF-8 before body checks would help
somewhat with that. I seem to remember there's been talk of doing this.

Yes, or utf-16...  I think that will be necessary to keep SA effective
in the modern world sooner than later.

Okay, but… if the message body is non-ASCII and the CTE is 8bit or
base64 and no explicit charset has been given, how do you know which
translation to perform?

I get a lot of Han SPAM in GB2312 where the charset is never specified
(apparently it’s a national default in China, despite the requirements
stated in RFC-2045 and -2046).
Sorry, I haven't even started delving into the devilish details but I
know it's looming as a needed feature.

--As for the rest, it is mine.

Just to start the discussion: I'd say default to UTF-8 if not otherwise specified and can't be worked out. (How hard to work on 'working it out' is a question, of course.) It's the growing standard, as far as I can tell.

Even if it's wrong in a particular case, it would probably be useful: It would give rule writers something to work with.

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

Reply via email to