Kai Schaetzl wrote:
Arthur Kerpician wrote on Thu, 09 Apr 2009 09:41:22 +0300:
The docs mention that after 5000 spam and ham learned,
spamassassin doesn't improve spam detection much.
do they? What is meant is that once you reach some threshold the detection
rate doesn't improve as good as before. You can't get any better as
"nearly everything". But it will drop if no new tokens get added.
What is the best
practice to optimize the bayes detection? Should I stop auto-learning
after reaching the 5000 mark and than re-train from time to time from
scratch?
No, keep the automatic training (unless there are too many FPs in the
autotrained messages). Do a regular manual expire, so old tokens are
purged out.
I don't get many FPs or FNs after upgrading to 3.2.5 and retraining
bayes. But, if I keep auto-learning enabled, I should monitor the
trained spam and ham levels and manual train ham when the spam exceeds
it (as it will always exceed ham level). So from time to time I should
feed ham manually to sa-learn, until it reaches the spam level again. Is
this correct? If it is, I think it's rather time-consuming to always
check the trained ham/spam and level them.
I was thinking to increase bayes_auto_learn_threshold_spam to a higher
number, so less spam is auto-learned. Is this ok?