On Wed, 28 Jan 2015 15:58:56 +0100
Reindl Harald wrote:

> * first:  it is a bug to write/lock when auto_expire / auto_learn is
> off

As I said, it's not a bug. The updates are done in case you want to
expire later with  sa-learn --force-expire. 

Auto-expiry means performing the expiry automatically when the database
goes over its configured token limit. Most people don't do this because
the expiry is then  done during a classification which can cause
a timeout.

Setting "auto_expire 0" is not a way of telling SA that you aren't going
to expire the database.



On Wed, 28 Jan 2015 01:03:37 +0100
Reindl Harald wrote:

> ...       even if we decide to kill spam-spamles older than x
> months it needs to be done properly to the 50% spam / 50% ham
> ratio which is the reason the bayes works that good 

The ratio doesn't matter; it's a myth that it should be 50:50 or match
the ratio in your mail. 


What's important is that you learn enough ham and enough spam, and that
the training is correct and sufficiently representative. It is
preferable that there isn't a big mismatch between the ham/spam ratio
in the corpus as a whole and in recently added mail as that can skew
the probabilities of new tokens.    


> compared with
> autolearning setups where everyone i have seen in the past 8 years
> became worser each month until classify most ham as spam and let
> thorugh the real crap

It works for some, but when it fails it's not because the ratio of
spam to ham is wrong, it's because of a combination of mistraining,
inadequate ham and poor choices in what's learned. 

Reply via email to