> As your only email access? pretty much, yes. > <g> Try several thousand, as a number of customers have reported to > me...
oh, I've been there - I'm just trying to avoid going there again. :) > Mmm. Dangerous - I've seen FPs get autolearned as spam once or twice. > :( I realize that. With my system on my spam the way it is now, my spam threshold is set to one. I have not seen a FP >=3.0 in several months. So, I know there's a risk. > What I do on my accounts is set up a "big-spam" folder, and rely on the > X-Spam-Level header to move mail there. Anything scoring 15 or higher > gets 15 or more stars in X-Spam-Level, and I have this: > > :0: > * ^X-Spam-Level:.\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* > /home/kdeugau/mail/bigspam > > before the check that files spam in my "main" spam folder. > > With the well-tuned 2.64+SURBL systems I have, ~80% or the spam usually > ends up in the "big-spam" folder. If I did that with a threshold of 3.0 on my system I would have had 84% of the total 'spams' I've gotten in the last week end up in the big-spam folder, with no FPs. > [snip] >> debug: auto-learn? ham=0.1, spam=1, body-points=0, head-points=-2.82, >> learned-points=1.886 >> debug: auto-learn? no: scored as spam but too few body points (0 < 3) > > These two entries are the critical ones; note the body-points and > head-points. To be autolearned as spam, a message must hit tests worth > a total of 3 points or more on header tests, and a total of 3 points or > more on body tests. I'm sure that's the problem. Here's a different sample spam, minus the bayes score (which isn't counted on the autolearn body tests, correct?) 2.2 RCVD_HELO_IP_MISMATCH Received: HELO and IP do not match, but should 3.0 DATE_IN_FUTURE_12_24 Date: is 12 to 24 hours after Received: date 1.2 RCVD_NUMERIC_HELO Received: contains an IP address used for HELO 2.7 FORGED_YAHOO_RCVD 'From' yahoo.com does not match 'Received' headers No body hits there... So basically, I'm getting what I want from the headers, and from what bayes already knows. How do I tweak the thresholds that the autolearner uses, for example, either setting the body threshold to 0 or eliminating that check entirely? I realize this might produce unwanted results, so I'd probably give it a week or so initial experiment. > I notice you're still using the default autolearn-as-ham setting; this > is dangerous as very low-scoring spam can get autolearned incorrectly. > I've dropped it to -0.01 on my systems to prevent this. That's a good tip, i'll implement that. Thanks!