Re: Any drawbacks of cron-scheduled bayesian leanring?

Faisal N Jawdat Wed, 25 Apr 2007 13:54:39 -0700

On Apr 25, 2007, at 4:30 PM, Arik Raffael Funke wrote:

I am now probably venturing off-topic on my own thread but thepoint you make is interesting: You train only misfiled messages.What about new but correctly filed messages? You _never_ train onthem?Given that bayes is a statistical method, is it really sufficientto only train on the mis-files?

the nightly cron job trained against the spam folder and a subset ofthe read folders likely to have spam in them (archive, recent workingfolders, etc.). i'd periodically retrain across the entire mailtree. the retraining only for specific misfiled messages handlesboth spam and hand.

retraining only on misfiles is not as accurate as training on allmail, but is a lot lighter weight, so i can run it every 5 minutesinstead of every night.

The proportional spam/ham weight of keywords would in this case notbe adjusted in the database if/when they change in your mailtraffic, or? Are you not encountering a higher number of mis-filescompared to your previous learning practise?

the number of misfiles i get is so low that it's hard to tell ifthere's a difference. i periodically get floods of new false-negatives, but those typically correct after the first few areretrained. when retraining across the entire mail spool the problemsusually corrected after the first night.


-faisal

Re: Any drawbacks of cron-scheduled bayesian leanring?

Reply via email to