On Fri, 20 Mar 2009, Jesse Stroik wrote:
Hoover Chan wrote:
The threshold was set to 6.6 (cf. required=6.6). The message this was
attached to was very definitely junk. This kind of situation got me
curious about the whole thing where any positive spam score is set as
the threshold but seeing junk mail coming in with negative scores.
You are getting negative scores for auto white list and for bayes_00.
This means:
(1) mistrained bayes - review your training corpus (you did keep it,
didn't you?) for correct classification and disable autolearn until you're
confident reasonable scores are being assigned by SA.
(2) a history of low-scoring spam (AWL wants to reduce the score on this
one even more, so it thinks this sender has a hammy history). Clear your
AWL database after you retrain BAYES.
It's a matter of taste and what you believe makes sense, but I don't
consider bayes to be all that accurate (since there are methods for
defeating bayes, poisoning bayes, etc).
It's very reliable if you take care training it.
As such, I don't allow Bayes to assign negative scores or positive
scores within a couple of points of the threshold. You can do so by
assigning scores like this:
score BAYES_00 0
score BAYES_05 0
score BAYES_20 0
score BAYES_40 0
Then you're losing a lot of the benefit of Bayes. If you're having that
serious a problem with it, I'd suggest you need to review your training.
I also disable AWL since a lot of spam, especially the stuff most likely
to be tested against spamassassin, will like use known good email
addresses from your domain as the "from" address. This is fairly likely
to hit on the AWL.
AWL includes the source IP address so it's unlikely a forged message will
benefit from your outbound traffic.
Hoover, you need to review your bayes training corpa and make sure they
are clean and correct, and retrain from scratch.
Disable autolearn until you're confident SA is scoring well, then start
autolearn with the thresholds pushed out (e.g. learn as ham when the score
is below -5, as spam when it's above 15). Also consider whether or not you
even want to autolearn - if your userbase is small then you may be better
served by purely manual training fed by a few clueful users.
Also, please do post a sample (all headers intact, to someplace like
pastebin) of a low-scoring spam. The fact that so few rules hit indicates
there may be other problems with your config, or that it's a very short
spam (which is harder to get a good score for). This will also contribute
to a poorly-performing autotrained bayes.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
You do not examine legislation in the light of the benefits it
will convey if properly administered, but in the light of the
wrongs it would do and the harms it would cause if improperly
administered. -- Lyndon B. Johnson
-----------------------------------------------------------------------
1325 days until the Presidential Election