On Sun, 11 Mar 2012 13:56:52 -0600
LuKreme wrote:

> 
> On 09 Mar 2012, at 17:07 , RW wrote:
> 
> > It's been demonstrated on Bogofilter that "train-on-everything"
> > outperforms "train-on-error" on the same corpora. They both end-up
> > with similar accuracy, but "train-on-everything" gets there very
> > much faster.
> 
> But training is exceedingly slow. Under normal load, sa-learn putters
> along at 2.5-4 mesg/sec, and under load it can drop to under 1.
> 
> Now, sure, perhaps I should throw a quad core i7 at it, but REALLY?

You missing the point. What I'm saying is that train-on-error is not
more accurate that train-on-everything, and that training on
Spamassassin errors is going to be worse, not the optimal method as
was claimed. 

If you want to trade accuracy for cost that's fine as long as you're
clear about it, but it shouldn't be dressed-up as a better way to learn.

I'm not saying everything needs to learned. In general training on spam
that doesn't hit BAYES_99 and ham that doesn't  hit BAYES_00 is a
reasonable compromise. The big problem with only training on  full
spamassassin errors is that failure to properly classify ham will
rarely be corrected.



Reply via email to