Could someone explain how the expire time stuff in the Bayes DB works?

Yesterday I trained with 6096 spam and 1061 ham. I got no errors and presumed everything was fine. Until I got into work this morning and the boss complained NOTHING was marked as spam.

Lo and behold sa-learn --dump magic tells me that I have 746 spam and 3 ham in Bayes. So Bayes isn't running.

How did this happen!? Did training with so many messages force an expire on all my old data? And if sa-learn told me it learned from everything I gave it, why doesn't it have them all in there anymore?

Does anyone know what could have caused it? Does sa-learn not like processing a large number of messages? (I'm using --mbox for sa-learn)

Any advice would be most appreciated.

Fortunately I archive everything I've trained with for the last 8 months, so I do have a collection of 57,000+ spam and 8900+ ham. I do know I was pushing the limits of the server yesterday when training with the 6000 spam (RH Enterprise, load climbed to 8.92 and CPU usage climbed up to about 93%), so I have no intention to train with 57,000+ messages! Is there an optimal number of messages push through each time that will keep Bayes happy and prevent it from eating itself again?

--JR







Reply via email to