I've start experimenting with Spambayes a bit more on my home Linux machine.
I have two directories: TrainSpam and TrainHam I put false positives/negatives and unsures in the appropriate directory. Every 5 minutes a cron job trains on those directories. Once a day, another cron job: - purges anything in these two directories that is older than 7 days. - moves my existing 'hammiedb' file - creates a new 'hammiedb' file - forces a re-training on the TrainSpam and TrainHam directories While I don't have anything quantitative, my amount of false negatives and false postives seems to be drastically reduced. This script has the effect of only keeping words in the database that have been seen in the past 7 days. Accounting, somewhat, for the change in the character of spam (and ham). Maybe once a day is overkill...but right now my system has cycles to spare. -r On Tue, 8 Nov 2005, Jesse Pelton wrote: > See > http://spambayes.sourceforge.net/faq.html#can-i-share-move-my-training-d > ata-from-one-computer-to-another for how to do this. > > But there's a price for that answer: I'm going to give you my opinion as > well. I wouldn't bother copying the training data. SpamBayes learns very > quickly, and the character of the spam I receive changes over time, so I > rather than hanging on to training data, I delete it and retrain from > scratch periodically. Within a day or two I find I'm getting better > results. > > _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
