On Wed, 2013-05-22 at 11:34 -0400, Andrew Talbot wrote: > I set up Bayes with autolearning a few weeks ago. It took forever to > get started, but now it seems like the learning speed has accelerated. > > Is the autolearning supposed to accelerate? I can't help but feel like > it may just be feeding itself it's own data or something.
There is no feedback loop in the learning process. Automatic learning is based on non-Bayes scores, and does in particular entirely ignore certain rules like BAYES_nn. See the AutoLearnThreshold [1] plugin. Additionally, there are quite a few constraints for auto-learning to happen. Besides the score thresholds, there are (non-configurable) constraints for header and body rules being involved, etc. Since there is no feedback here, I'd guess the "acceleration" is most likely perceived only -- "it seems" to have accelerated, you said. Tried backing that up with numbers? The learning speed (or rather number of learned per overall messages) can be influenced by a few factors. (a) Changed scores (sa-update run) might have an impact, due to different scores of matching header and body rules. (b) The spam in-stream, especially changes, spikes, or certain spam patterns can make a huge difference. (c) And of course, whether you are using bayes_auto_learn_on_error, besides likely others I just forgot. In a nutshell: No feedback, thus no inherent acceleration. And most definitely not logarithmic. [1] http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}