On Wed, 28 Jul 2010 03:04:32 -0700 (PDT)
andrij <andriy.stet...@gmail.com> wrote:

> 
> Hi all,
> 
> I am playing with RelayCountry plugin. I have a small database of
> e-mails. I processed these emails with RelayCountry plugin, so every
> email contains X-Spam-Relay-country header (and corresponding
> countries). 
> 
> Now I want to train Bayes with these emails.
> 
> Does Bayes learn the tokens from the X-Spam-Relay-Country header? 

Contrary to popular belief, the country codes are not used by Bayes.

> I think that it does not, because all headers "X-Spam-" are removed
> before learning, right?

That's not the reason. The plugin does make the data available to Bayes
as the metadata from which X-Spam-Relay-Country is created, but it
consists entirely of two letter country-codes, and Bayes doesn't
tokenize anything under 3 characters. 

I filed a bug-report about this a few months ago.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6433

I wrote a patch last week (which I've attached) to add country pairs as
separate token metadata  e.g.

X-Spam-Relay-Countries: US US CA NG
X-Spam-Relay-Country-Tokens: Trusted_US USCA CANG

It's not a straight fix, but I'll submit it if no-one has a better idea.

Attachment: patch-RelayCountry.pm
Description: Perl program

Reply via email to