Re: sa-learn spam and Bayes_50

Adam Katz Tue, 27 Oct 2009 08:29:58 -0700

Sam wrote:
>> I run spamassassin quite fine on a debian-lenny system.
>> But I'm having a problem with sa-learn --spam and 1 message :
>> But Bayes still show BAYES_50 :

The Bayesian algorithm adds tokens from messages it is taught.  These
tokens are then added to the database's existing tokens and
probabilities are recalculated for each token.  Sometimes those new
tokens aren't terribly useful, having been trained in both ham and
spam.  It is always possible that a message you just trained still
lacks certainty, thus getting /rounded/ to BAYES_50.

RW wrote:
> If you find it surprising that that can happen, you don't
> understand how Bayes works. It's a leaning system that's intended
> to classify mail it hasn't seen based on mail it has seen.

BAYES_50 may be the default for a new mail with no known tokens (a
pure 50.000%), but it can also be the result of conflicting tokens
already in the system (anything ranging from 45.000% to 54.999%).

If you were to tell SpamAssassin to report the actual bayes score
(e.g. "add_header all Bayes _BAYES_" in your local.cf), you'd probably
find that that message wasn't a pure 50% (though I can't recall how
many significant digits it uses).

Re: sa-learn spam and Bayes_50

Reply via email to