On Wed, 28 Jul 2010 03:04:32 -0700 (PDT) andrij <andriy.stet...@gmail.com> wrote:
> > Hi all, > > I am playing with RelayCountry plugin. I have a small database of > e-mails. I processed these emails with RelayCountry plugin, so every > email contains X-Spam-Relay-country header (and corresponding > countries). > > Now I want to train Bayes with these emails. > > Does Bayes learn the tokens from the X-Spam-Relay-Country header? Contrary to popular belief, the country codes are not used by Bayes. > I think that it does not, because all headers "X-Spam-" are removed > before learning, right? That's not the reason. The plugin does make the data available to Bayes as the metadata from which X-Spam-Relay-Country is created, but it consists entirely of two letter country-codes, and Bayes doesn't tokenize anything under 3 characters. I filed a bug-report about this a few months ago. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6433 I wrote a patch last week (which I've attached) to add country pairs as separate token metadata e.g. X-Spam-Relay-Countries: US US CA NG X-Spam-Relay-Country-Tokens: Trusted_US USCA CANG It's not a straight fix, but I'll submit it if no-one has a better idea.
patch-RelayCountry.pm
Description: Perl program