On 23/12/16 10:12, Sebastian Arcus wrote:
I know this hot potato has been discussed before - but I'm afraid it's
back to haunt me and I can't fathom it out. I'm getting again different
bayes results if I test a message on the command line, compared to it
going through exim -> spamassassin.
>
> </snip>

OK - after staring for a good while at debug logs, I think I finally found the culprit. The saved .eml file which I pass through spamc contains the report embedded by spamassassin in the headers (that's how my Exim is configured). This report includes the first few lines of the actual email body. This in turn has the effect of effectively doubling the Bayes score, as spamassassin tokenizes these sample lines on top of the actual email body. As the email body for these particular spam emails is small - the sample in the header is almost equal in size with the text in the email body itself.

As soon as I manually delete the SA headers and report in the .eml file, and pass the message again through spamc, I get identical Bayes scores to the ones when the message passes initially through Exim -> SA.

However, this raises some interesting questions. It would appear that SA is incapable of recognising it's own reports in the header of the emails, and tokenizes them as well and adds them to the Bayes report. Is that right?

Also, does it mean that, as SA tokenizes all the info in the headers, my own email address, as the recipient of the email, will also be added to the database of spam tokens - when I ask SA to learn a message as spam?

I seem to have ended up with more questions than I started :-)

Reply via email to