Re: negative scores for spam

John Hardin Fri, 20 Mar 2009 15:08:50 -0700

On Fri, 20 Mar 2009, Jesse Stroik wrote:

Hoover Chan wrote:

 The threshold was set to 6.6 (cf. required=6.6). The message this was
 attached to was very definitely junk. This kind of situation got me
 curious about the whole thing where any positive spam score is set as
 the threshold but seeing junk mail coming in with negative scores.


You are getting negative scores for auto white list and for bayes_00.


This means:

(1) mistrained bayes - review your training corpus (you did keep it,didn't you?) for correct classification and disable autolearn until you'reconfident reasonable scores are being assigned by SA.

(2) a history of low-scoring spam (AWL wants to reduce the score on thisone even more, so it thinks this sender has a hammy history). Clear yourAWL database after you retrain BAYES.

It's a matter of taste and what you believe makes sense, but I don'tconsider bayes to be all that accurate (since there are methods fordefeating bayes, poisoning bayes, etc).


It's very reliable if you take care training it.

As such, I don't allow Bayes to assign negative scores or positivescores within a couple of points of the threshold. You can do so byassigning scores like this:
score BAYES_00  0
score BAYES_05  0
score BAYES_20  0
score BAYES_40  0

Then you're losing a lot of the benefit of Bayes. If you're having thatserious a problem with it, I'd suggest you need to review your training.

I also disable AWL since a lot of spam, especially the stuff most likelyto be tested against spamassassin, will like use known good emailaddresses from your domain as the "from" address. This is fairly likelyto hit on the AWL.

AWL includes the source IP address so it's unlikely a forged message willbenefit from your outbound traffic.

Hoover, you need to review your bayes training corpa and make sure theyare clean and correct, and retrain from scratch.

Disable autolearn until you're confident SA is scoring well, then startautolearn with the thresholds pushed out (e.g. learn as ham when the scoreis below -5, as spam when it's above 15). Also consider whether or not youeven want to autolearn - if your userbase is small then you may be betterserved by purely manual training fed by a few clueful users.

Also, please do post a sample (all headers intact, to someplace likepastebin) of a low-scoring spam. The fact that so few rules hit indicatesthere may be other problems with your config, or that it's a very shortspam (which is harder to get a good score for). This will also contributeto a poorly-performing autotrained bayes.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  You do not examine legislation in the light of the benefits it
  will convey if properly administered, but in the light of the
  wrongs it would do and the harms it would cause if improperly
  administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
 1325 days until the Presidential Election

Re: negative scores for spam

Reply via email to