RW-15 wrote: > >> Does Bayes learn the tokens from the X-Spam-Relay-Country header? > > Contrary to popular belief, the country codes are not used by Bayes. > >> I think that it does not, because all headers "X-Spam-" are removed >> before learning, right? > > That's not the reason. The plugin does make the data available to Bayes > as the metadata from which X-Spam-Relay-Country is created, >
That will work fine for the scoring phase - spamassassin processes an e-mail with the RelayCountry plugin, the RelayCountry plugin stores country information (internally) in metadata, then the bayes classifier uses these metadata to score the e-mail. Right? Are these metadata stored (permanently) within a processed e-mail? I am asking about that because of the following scenario. Let's say I have databases of spam and ham e-mails, which were no processed with the RelayCountry plugin. If I run sa-learn, will these e-mails be processed with the RelayCountry plugin before being tokenized? I assume that not (am I right?). Hence, I firstly need to process the databases with RelayCountry plugin, and then use sa-learn to train the Bayes classifier. However, if the metadata (from the first step) are not stored permanently within the emails, the Bayes classifier will not learn these metadata, right? RW-15 wrote: > > but it consists entirely of two letter country-codes, and Bayes doesn't > tokenize anything under 3 characters. > Thank you for this very useful information and the patch! Is it not enough just to add something to the country code in RelayCountrly.pm to make it longer, like "$cc = "Code" . $cc;"? -- View this message in context: http://old.nabble.com/RelayCountry-plugin-tp29284940p29295643.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.