Auto-Learn Thresholds (was: lottery message scored hammy by bayes)

Karsten Bräckelmann Wed, 26 Aug 2009 06:17:56 -0700

On Tue, 2009-08-25 at 22:13 -0400, Alex wrote:
> > If you're using autolearning, what are your learning thresholds?
> 
> What do you recommend for thresholds? I'm considering using
> autolearning, but very concerned about corrupting the database. I
> think I would use something like +15 for spam.


I generally recommend the defaults, unless you *do* know you need
something else. That's why they are defaults.

That's <= 0.1 for ham and >= 12.0 for spam. Keep in mind these scores
are calculated using a non-Bayes score set, so they generally differ
from the overall score of the message. Also, this does not take various
specific rules' scores into account, like Bayes and AWL. Plus some more
esoteric constraints.

See the docs. [1]


> There are FNs on occasion in the 2.x range with low bayes numbers (or
> BAYES_50) that I wouldn't want to be tagged as ham. Should that be a
> concern?

No.  Bayes auto-learning is *not* self-feeding.

Any overall score of about 2 (with Bayes) is *very* unlikely to cross
either threshold when using the respective non-Bayes score-set.

Moreover, your concern is with Bayes probability <= 50%, and thus a
negative score for the BAYES hit. This hit is not considered for
auto-learning, though, and as a first rule-of-thumb subtract that score
again -- which yields a slightly higher score. Still no way even close
to the thresholds.


> Even mail that has been whitelisted could also contain spam, so would
> a ham threshold of like -100 work, or present the same problem?

60_whitelist.cf:  tflags USER_IN_WHITELIST  userconf nice noautolearn

Again, as per the docs [1], whitelisting will not be considered for the
decision whether to auto-learn or not.

  guenther


[1] 
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Auto-Learn Thresholds (was: lottery message scored hammy by bayes)

Reply via email to