RW-15 wrote:
> 
>> Does Bayes learn the tokens from the X-Spam-Relay-Country header? 
> 
> Contrary to popular belief, the country codes are not used by Bayes.
> 
>> I think that it does not, because all headers "X-Spam-" are removed
>> before learning, right?
> 
> That's not the reason. The plugin does make the data available to Bayes
> as the metadata from which X-Spam-Relay-Country is created,
> 

That will work fine for the scoring phase - spamassassin processes an e-mail
with the RelayCountry plugin, the RelayCountry plugin stores country
information (internally) in metadata, then the bayes classifier uses these
metadata to score the e-mail. Right?

Are these metadata stored (permanently) within a processed e-mail? I am
asking about that because of the following scenario.

Let's say I have databases of spam and ham e-mails, which were no processed
with the RelayCountry plugin. If I run sa-learn, will these e-mails be
processed with the RelayCountry plugin before being tokenized? I assume that
not (am I right?). Hence, I firstly need to process the databases with
RelayCountry plugin, and then use sa-learn to train the Bayes classifier.
However, if the metadata (from the first step) are not stored permanently
within the emails, the Bayes classifier will not learn these metadata,
right? 


RW-15 wrote:
> 
> but it consists entirely of two letter country-codes, and Bayes doesn't
> tokenize anything under 3 characters.
> 

Thank you for this very useful information and the patch! Is it not enough
just to add something to the country code in RelayCountrly.pm to make it
longer, like "$cc = "Code" . $cc;"?
-- 
View this message in context: 
http://old.nabble.com/RelayCountry-plugin-tp29284940p29295643.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Reply via email to