On Tue, 13 Mar 2012 12:40:16 +0000 Jenny Lee <bodycar...@live.com> wrote:
> Will give this a go. What I don't understand is that... Why is this > not catching this 'utf' which is on the subject? You need the :raw tag to see the raw, unencoded header. The meta-rule: header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/ attempts to limit matches on UTF-8 subjects to Chinese characters because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK ideographs. It's not a perfect filter, but blocking all UTF-8-encoded subjects would yield way too many FPs for us. Regards, David. PS: I haven't looked at SA's Bayes implementation. Can it handle words in non-western character sets properly?