I'm trying to use this set of rules to spot Chinese or Russian characters
in the subject line:
<http://www.timk.de/it-blog/howto-find-chinese-or-russian-spam-encoded-in-utf-8-with-spamassassin/>
To debug the rules, I've replaced the leading __ in sub-rules with T_.
The rules don't seem to match the base64-encoded UTF8 sequences I'm seeing
in subject lines.
For example:
X-Spam-Status: No, score=1.7 required=5.0 tests=BAYES_50,
CHARSET_UTF8_B_SUBJ_LATIN,HTML_FONT_FACE_BAD,HTML_MESSAGE,
T_CHARSET_SUBJECT_UTF8_B_ENCODED,T_CHARSET_SUBJECT_UTF8_ENCODED
autolearn=no
version=3.3.1
Subject: =?utf-8?B?54mp5paZ6K6h5YiS5Y2P6LCDL+iJvueUnw==?=
The first character is 7269 hex, which if the rules are correct should be
matched by __CHARSET__UTF8_SUBJ_CJK1.
I'm using this to decode the base64 between the question marks to inspect
the result:
<http://www.opinionatedgeek.com/dotnet/tools/base64decode/>