On 5/5/22 11:59, Dave Wreski wrote:
You should probably check that none of your ham (i.e. non-spam)
messages contains SPAM_99 or SPAM_999. It can happen when spammers
poison your bayes database, and increased score in that case might
lead to legitimate mail being misclassified as a spam.
That's a great call, thanks. I grepped my mail files and didn't find
any SPAM_99 headers in any of them.
You should be looking for BAYES_99 and BAYES_999 in your corpus.
Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox
/home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam
--mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't that
mean that I've already checked my corpora?
Thomas