Re: Bayes Autolearn Threshold - different scoring?

greg 11 Mar 2005 19:04:52 -0000

> As your only email access?
pretty much, yes.

> <g>  Try several thousand, as a number of customers have reported to
> me...


oh, I've been there - I'm just trying to avoid going there again. :)

> Mmm.  Dangerous - I've seen FPs get autolearned as spam once or twice.
> :(

I realize that. With my system on my spam the way it is now, my spam
threshold is set to one. I have not seen a FP >=3.0 in several months. So,
I know there's a risk.

> What I do on my accounts is set up a "big-spam" folder, and rely on the
> X-Spam-Level header to move mail there.  Anything scoring 15 or higher
> gets 15 or more stars in X-Spam-Level, and I have this:
>
> :0:
> * ^X-Spam-Level:.\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> /home/kdeugau/mail/bigspam
>
> before the check that files spam in my "main" spam folder.
>
> With the well-tuned 2.64+SURBL systems I have, ~80% or the spam usually
> ends up in the "big-spam" folder.

If I did that with a threshold of 3.0 on my system I would have had 84% of
the total 'spams' I've gotten in the last week end up in the big-spam
folder, with no FPs.

> [snip]
>> debug: auto-learn? ham=0.1, spam=1, body-points=0, head-points=-2.82,
>> learned-points=1.886
>> debug: auto-learn? no: scored as spam but too few body points (0 < 3)
>
> These two entries are the critical ones;  note the body-points and
> head-points.  To be autolearned as spam, a message must hit tests worth
> a total of 3 points or more on header tests, and a total of 3 points or
> more on body tests.

I'm sure that's the problem. Here's a different sample spam, minus the
bayes score (which isn't counted on the autolearn body tests, correct?)
 2.2 RCVD_HELO_IP_MISMATCH  Received: HELO and IP do not match, but should
 3.0 DATE_IN_FUTURE_12_24   Date: is 12 to 24 hours after Received: date
 1.2 RCVD_NUMERIC_HELO      Received: contains an IP address used for HELO
 2.7 FORGED_YAHOO_RCVD      'From' yahoo.com does not match 'Received'
headers

No body hits there... So basically, I'm getting what I want from the
headers, and from what bayes already knows. How do I tweak the thresholds
that the autolearner uses, for example, either setting the body threshold
to 0 or eliminating that check entirely? I realize this might produce
unwanted results, so I'd probably give it a week or so initial experiment.

> I notice you're still using the default autolearn-as-ham setting;  this
> is dangerous as very low-scoring spam can get autolearned incorrectly.
> I've dropped it to -0.01 on my systems to prevent this.

That's a good tip, i'll implement that.

Thanks!

Re: Bayes Autolearn Threshold - different scoring?

Reply via email to