On Fri, 20 Mar 2009, Jesse Stroik wrote:

Hoover Chan wrote:
 The threshold was set to 6.6 (cf. required=6.6). The message this was
 attached to was very definitely junk. This kind of situation got me
 curious about the whole thing where any positive spam score is set as
 the threshold but seeing junk mail coming in with negative scores.

You are getting negative scores for auto white list and for bayes_00.

This means:

(1) mistrained bayes - review your training corpus (you did keep it, didn't you?) for correct classification and disable autolearn until you're confident reasonable scores are being assigned by SA.

(2) a history of low-scoring spam (AWL wants to reduce the score on this one even more, so it thinks this sender has a hammy history). Clear your AWL database after you retrain BAYES.

It's a matter of taste and what you believe makes sense, but I don't consider bayes to be all that accurate (since there are methods for defeating bayes, poisoning bayes, etc).

It's very reliable if you take care training it.

As such, I don't allow Bayes to assign negative scores or positive scores within a couple of points of the threshold. You can do so by assigning scores like this:

score BAYES_00  0
score BAYES_05  0
score BAYES_20  0
score BAYES_40  0

Then you're losing a lot of the benefit of Bayes. If you're having that serious a problem with it, I'd suggest you need to review your training.

I also disable AWL since a lot of spam, especially the stuff most likely to be tested against spamassassin, will like use known good email addresses from your domain as the "from" address. This is fairly likely to hit on the AWL.

AWL includes the source IP address so it's unlikely a forged message will benefit from your outbound traffic.

Hoover, you need to review your bayes training corpa and make sure they are clean and correct, and retrain from scratch.

Disable autolearn until you're confident SA is scoring well, then start autolearn with the thresholds pushed out (e.g. learn as ham when the score is below -5, as spam when it's above 15). Also consider whether or not you even want to autolearn - if your userbase is small then you may be better served by purely manual training fed by a few clueful users.

Also, please do post a sample (all headers intact, to someplace like pastebin) of a low-scoring spam. The fact that so few rules hit indicates there may be other problems with your config, or that it's a very short spam (which is harder to get a good score for). This will also contribute to a poorly-performing autotrained bayes.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  You do not examine legislation in the light of the benefits it
  will convey if properly administered, but in the light of the
  wrongs it would do and the harms it would cause if improperly
  administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
 1325 days until the Presidential Election

Reply via email to