On 5/5/22 11:59, Dave Wreski wrote:

You should probably check that none of your ham (i.e. non-spam)
messages contains SPAM_99 or SPAM_999. It can happen when spammers
poison your bayes database, and increased score in that case might
lead to legitimate mail being misclassified as a spam.

That's a great call, thanks. I grepped my mail files and didn't find any SPAM_99 headers in any of them.

You should be looking for BAYES_99 and BAYES_999 in your corpus.


Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox /home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam --mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't that mean that I've already checked my corpora?

Thomas

Reply via email to