That's a great call, thanks. I grepped my mail files and didn't find
any SPAM_99 headers in any of them.
You should be looking for BAYES_99 and BAYES_999 in your corpus.
Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox
/home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam
--mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't that
mean that I've already checked my corpora?
No, that's how you train your corpora. If you manually look through the
headers of mail that's already been processed by your mail system, the
ham should be as close to BAYES_00 as possible, and spam should be at
BAYES_99. If that's not the case, then it's been trained incorrectly.
/etc/mail/spamassassin/local.cf:
bayes_auto_learn 0
bayes_auto_expire 0
I'd also recommend disabling auto-learn, if you have that enabled.
If you've gone through your corpus manually, and are certain the ham is
all good mail and the spam emails are all bad mail, then it might be
worth it to dump the existing bayes database and just retrain it with
the corresponding mboxes.
I also typically add --progress to sa-learn.
Best,
Dave
Thomas