On Fri, 9 Mar 2012 16:38:49 +0100 Matus UHLAR - fantomas wrote:
> You can of course configure mailer to train automatically on anything > received/delivered. However this would apparently cause much more > FP's and FN's rate than letting user train only those that misfire. The use of the word "apparently" never inspires much confidence. I'm guessing that you don't have any real evidence. > >If you're going to train on error then train on the right error, not > >a rarer, correlated error. > > The only error that really matters is the one that causes misfiring. No, it isn't. Bayes is a statistical filter it needs to learn a lot of diverse spam and ham to reach it's optimum accuracy. It's been demonstrated on Bogofilter that "train-on-everything" outperforms "train-on-error" on the same corpora. They both end-up with similar accuracy, but "train-on-everything" gets there very much faster. Bogofilter is almost identical to BAYES; they just differ in the details of the tokenizer and the Robinson parameters. Training on SA miss-classification is going to be glacially slow.