Re: Bayes Autolearn Threshold - different scoring?

Kris Deugau 11 Mar 2005 18:38:27 -0000

[EMAIL PROTECTED] wrote:
> My problem is this: I'm using squirrelmail,

As your only email access?


> and to keep an eye on false negatives (I define those as real mails
> that get shuttled to spam, just to keep things clear) I have a 'spam'
> folder. As anyone that uses sqmail knows, it gets very slow when any
> folder contains more than a few hundred messages.

<g>  Try several thousand, as a number of customers have reported to
me...

Actually, it's only spewed out error messages in a very few cases.

> But, since my
> filter is trained very well, I'd like to send autolearned spams to
> /mail/Trash (ultimately to /dev/null) so I don't have to deal with
> those.

Mmm.  Dangerous - I've seen FPs get autolearned as spam once or twice. 
:(

What I do on my accounts is set up a "big-spam" folder, and rely on the
X-Spam-Level header to move mail there.  Anything scoring 15 or higher
gets 15 or more stars in X-Spam-Level, and I have this:

:0:
* ^X-Spam-Level:.\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
/home/kdeugau/mail/bigspam

before the check that files spam in my "main" spam folder.

With the well-tuned 2.64+SURBL systems I have, ~80% or the spam usually
ends up in the "big-spam" folder.

> I figured just setting bayes_auto_learn_threshold_spam 6 would
> work great. It really does not do much of anything. I've decreased
> it to 3, and to 1, but it really doesnt make a difference. I found
> these relevant lines in a debug:

[snip]
> debug: auto-learn? ham=0.1, spam=1, body-points=0, head-points=-2.82,
> learned-points=1.886
> debug: auto-learn? no: scored as spam but too few body points (0 < 3)

These two entries are the critical ones;  note the body-points and
head-points.  To be autolearned as spam, a message must hit tests worth
a total of 3 points or more on header tests, and a total of 3 points or
more on body tests.

I notice you're still using the default autolearn-as-ham setting;  this
is dangerous as very low-scoring spam can get autolearned incorrectly.
I've dropped it to -0.01 on my systems to prevent this.

> What, exactly, is going on here? The head points I can explain (this
> is a spam I saved that had already come to me) but the body points -
> I don't understand. It also wasn't clear to me until this debug that
> the autolearn had its own scoring system.

Not entirely;  to decide whether to autolearn a message one of the
"no-Bayes" score sets is used to calculate the scores, depending on
whether you've got network tests disabled or not.

-kgd
-- 
Get your mouse off of there!  You don't know where that email has been!

Re: Bayes Autolearn Threshold - different scoring?

Reply via email to