Daniel T. Staal wrote:
>
> While I in general agree with this, I was under the impression that
> spamassassin will auto-learn from messages it marks.  (At least, past a
> certain threshold.)  
Actually, that's not entirely true. There's more than just a threshold.
Actually, the score you see isn't even the score compared against the
threshold.

Score computation generalities:
    1) The score is computed as if bayes was disabled. This includes
changing the score set.
    2) Any rule with the "noautolearn" tflag ie: white/blacklist
commands, is discarded.

>From there, the criteria to learn as spam using this "learning score" are:
    1) score above threshold (default 12.0)
    2) at least 3.0 points from header rules
    3) at least 3.0 points from body rules
    3) Existing bayes learning must not result in the message matching a
BAYES_* rule with a score less than -1.0
    4) The bayes R/W lock must be available on the first try. ie: no
other autolearn, manual learn or expiry processes are running.

And note that because of 2 and 3, the score needs to be over 6.0,
regardless of what you have the threshold set to.
If any of the above aren't met, autolearning will not happen.

In general the autolearner tries very hard to be ABSOLUTELY POSITIVE a
message is spam before autolearning it.

So relying on autolearning to learn all or even most of your spam isn't
a very good idea. It's not going to learn all your spam. It just won't.
> In which case, feeding the spam messages to it again
> would bias the database towards spam, as the messages are being learned
> twice.
>   
Actually, As Jim pointed out, it will skip message-id's that are already
in the bayes DB.

Also, this skip isn't particularly slow, so you're not wasting a ton of
CPU by re-feeding messages that were already auto learned.

> So the question would have to be: Does Spamassassin automatically update
> the Bayes database from (some/any) messages it flags as spam or ham?
>   
Some, yes.
Most, no.
Score less than 6, never.



Reply via email to