On Thu, Aug 12, 2004 at 09:22:25AM -0700, JR wrote: > > Yesterday I trained with 6096 spam and 1061 ham. I got no errors and > presumed everything was fine. Until I got into work this morning and the > boss complained NOTHING was marked as spam. > > Lo and behold sa-learn --dump magic tells me that I have 746 spam and 3 ham > in Bayes. So Bayes isn't running. > > How did this happen!? Did training with so many messages force an expire > on all my old data? And if sa-learn told me it learned from everything I > gave it, why doesn't it have them all in there anymore? > > Does anyone know what could have caused it? Does sa-learn not like > processing a large number of messages? (I'm using --mbox for sa-learn) > > Any advice would be most appreciated. >
What most likely happened is that when training you took longer than 5mins or so. This cause the scan code to think that your lock on the database was stale and broke it. Then you had more than one process accessing and updating the database causing corruption. 3.0 fixes this by refreshing the lock every so often. Best advice I could give is to blow away your corrupt database and start over. Train with your archived messages in small batches that run in less than 5 mins. Training on your whole archive might be excessive, but I don't know what the optimal number would be. Michael
