Re: Different bayes results from command line and through MTA

Sebastian Arcus Fri, 23 Dec 2016 05:35:31 -0800

On 23/12/16 10:12, Sebastian Arcus wrote:

I know this hot potato has been discussed before - but I'm afraid it's
back to haunt me and I can't fathom it out. I'm getting again different
bayes results if I test a message on the command line, compared to it
going through exim -> spamassassin.

>
> </snip>

OK - after staring for a good while at debug logs, I think I finallyfound the culprit. The saved .eml file which I pass through spamccontains the report embedded by spamassassin in the headers (that's howmy Exim is configured). This report includes the first few lines of theactual email body. This in turn has the effect of effectively doublingthe Bayes score, as spamassassin tokenizes these sample lines on top ofthe actual email body. As the email body for these particular spamemails is small - the sample in the header is almost equal in size withthe text in the email body itself.

As soon as I manually delete the SA headers and report in the .eml file,and pass the message again through spamc, I get identical Bayes scoresto the ones when the message passes initially through Exim -> SA.

However, this raises some interesting questions. It would appear that SAis incapable of recognising it's own reports in the header of theemails, and tokenizes them as well and adds them to the Bayes report. Isthat right?

Also, does it mean that, as SA tokenizes all the info in the headers, myown email address, as the recipient of the email, will also be added tothe database of spam tokens - when I ask SA to learn a message as spam?


I seem to have ended up with more questions than I started :-)

Re: Different bayes results from command line and through MTA

Reply via email to