On Wed, 28 Jan 2015 15:58:56 +0100 Reindl Harald wrote:
> * first: it is a bug to write/lock when auto_expire / auto_learn is > off As I said, it's not a bug. The updates are done in case you want to expire later with sa-learn --force-expire. Auto-expiry means performing the expiry automatically when the database goes over its configured token limit. Most people don't do this because the expiry is then done during a classification which can cause a timeout. Setting "auto_expire 0" is not a way of telling SA that you aren't going to expire the database. On Wed, 28 Jan 2015 01:03:37 +0100 Reindl Harald wrote: > ... even if we decide to kill spam-spamles older than x > months it needs to be done properly to the 50% spam / 50% ham > ratio which is the reason the bayes works that good The ratio doesn't matter; it's a myth that it should be 50:50 or match the ratio in your mail. What's important is that you learn enough ham and enough spam, and that the training is correct and sufficiently representative. It is preferable that there isn't a big mismatch between the ham/spam ratio in the corpus as a whole and in recently added mail as that can skew the probabilities of new tokens. > compared with > autolearning setups where everyone i have seen in the past 8 years > became worser each month until classify most ham as spam and let > thorugh the real crap It works for some, but when it fails it's not because the ratio of spam to ham is wrong, it's because of a combination of mistraining, inadequate ham and poor choices in what's learned.