On Sun, 11 Mar 2012 13:56:52 -0600 LuKreme wrote: > > On 09 Mar 2012, at 17:07 , RW wrote: > > > It's been demonstrated on Bogofilter that "train-on-everything" > > outperforms "train-on-error" on the same corpora. They both end-up > > with similar accuracy, but "train-on-everything" gets there very > > much faster. > > But training is exceedingly slow. Under normal load, sa-learn putters > along at 2.5-4 mesg/sec, and under load it can drop to under 1. > > Now, sure, perhaps I should throw a quad core i7 at it, but REALLY?
You missing the point. What I'm saying is that train-on-error is not more accurate that train-on-everything, and that training on Spamassassin errors is going to be worse, not the optimal method as was claimed. If you want to trade accuracy for cost that's fine as long as you're clear about it, but it shouldn't be dressed-up as a better way to learn. I'm not saying everything needs to learned. In general training on spam that doesn't hit BAYES_99 and ham that doesn't hit BAYES_00 is a reasonable compromise. The big problem with only training on full spamassassin errors is that failure to properly classify ham will rarely be corrected.