Re: SA-learn (spamassassin)

Karsten Bräckelmann Sun, 02 Aug 2009 08:16:24 -0700

On Sun, 2009-08-02 at 02:00 +0100, RW wrote:
> On Sun, 02 Aug 2009 01:42:21 +0200 Karsten Bräckelmann wrote:


> > > when I learn bayes by hand (sa-learn --spam --file mail) that this
> > > mail is spam? I have explicit set in local.cf bayes_min_spam_num 1.
> > > This means that for bayes is sufficient one mail for
> > > learning(according to me). But it dosesnt work.

> > Do NOT do that.
> > 
> > Unless you *really* understand the implications. Which you don't.
> > It's a default for a reason.
> > 
> > It's a counter-measure against bad learning, to force at least some
> > MINIMAL manual training, before auto-learning kicks in. You just side-
> > stepped that.
> 
> AFAIK it doesn't affect autoleaning at all, bayes_min_spam_num &
> bayes_min_ham_num control when scoring starts.

Well, it *does* nonetheless. *shrug*

As per the docs, that threshold controls when Bayes activates. Nothing
more, nothing less. Want to see for yourself?


$ echo | spamassassin --cf='score EMPTY_MESSAGE 6' --cf='score MISSING_DATE 6'

X-Spam-Status: Yes, score=17.3 required=8.0 tests=EMPTY_MESSAGE,MISSING_DATE,
  MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
  NO_RELAYS,TVD_SPACE_RATIO autolearn=spam version=3.2.5

$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          2          0  non-token data: nspam
0.000          0          1          0  non-token data: nham
0.000          0         20          0  non-token data: ntokens


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: SA-learn (spamassassin)

Reply via email to