Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?

Thomas Cameron Thu, 05 May 2022 10:40:55 -0700

On 5/5/22 11:59, Dave Wreski wrote:

You should probably check that none of your ham (i.e. non-spam)
messages contains SPAM_99 or SPAM_999. It can happen when spammers
poison your bayes database, and increased score in that case might
lead to legitimate mail being misclassified as a spam.

That's a great call, thanks. I grepped my mail files and didn't findany SPAM_99 headers in any of them.


You should be looking for BAYES_99 and BAYES_999 in your corpus.

Thanks, Dave. I use my various mailboxes (sa-learn --ham --mbox/home/thomas.cameron/mail/INBOX/[mailbox file] and then sa-learn --spam--mbox /home/thomas.cameron/mail/INBOX/spam) to train SA, doesn't thatmean that I've already checked my corpora?


Thomas

Re: Why shouldn't I set the score for SPAM_99 and SPAM_999 higher?

Reply via email to