On 2021-03-09 08:28 AM, Greg Troxel wrote:
Steve Dondley <s...@dondley.com> writes:

I've read through
https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which
states that "anything over about 5000 messages does not improve
accuracy significantly in our tests."

I would take that with a grain of salt. Based on my experience running
SA for many years, I'd say that if you have new spam  that isn't like
the spam you already have, learning on it will help.

Also, I take it as a comment about "there's no need to try hard to get
more the 5K messages".  It doesn't say, "if you train on more than 5000
bad things will happen".

So once I hit 5,000, what do? Do I run --forget on say the 500 oldest
emails, delete those from my ham/spam folders and then add in a batch
of 500 newer ham/spam emails and then run sa-learn on all the emails
in my spam/ham folders?

I've been running sa-learn daily over my ham folders and my spam folders
for years.  I refile spam and ham so that it will be learned.  I find
the bayes scoring is quite good except for novel spam. My bayes_* files
are about 83M in total.

So I don't think you necessarily have a problem to solve.

OK, thanks for the advice. Appreciated.

Reply via email to