Matt Kettler wrote: > > It is perfectly reasonable to assume that most of the mail matching > BAYES_99 also matches a large number of the stock spam rules that SA > comes with. These highly-obvious mails are the model after which > most SA rules are made in the first place. Thus, these mails need > less score boost, as they already have a lot of score from other > rules in the ruleset. > > However, mails matching BAYES_95 are more likely to be "trickier", > and are likely to match fewer other rules. These messages are more > likely to require an extra boost from BAYES_95's score than those > which match BAYES_99.
I can't argue with this description, but I don't agree with the conclusion on the scores. The Bayes rules are not individual unrelated rules. Bayes is a series of rules indicating a range of probability that a message is spam or ham. You can argue over the exact scoring, but I can't see any reason to score BAYES_99 lower than BAYES_95. Since a BAYES_99 message is even more likely to be spam than a BAYES_95 message, it should have at least a slightly higher score. It is obvious that a BAYES_99 message is more likely to hit other rules and therefore be less reliant on a score increase from Bayes, but this is no reason to drop the score. I generally don't look into the rule scoring too much unless I run into a problem, but I thought this had been fixed in the latest couple of versions anyway. Looking at my score file, I find this: score BAYES_00 0.0001 0.0001 -2.312 -2.599 score BAYES_05 0.0001 0.0001 -1.110 -1.110 score BAYES_20 0.0001 0.0001 -0.740 -0.740 score BAYES_40 0.0001 0.0001 -0.185 -0.185 score BAYES_50 0.0001 0.0001 0.001 0.001 score BAYES_60 0.0001 0.0001 1.0 1.0 score BAYES_80 0.0001 0.0001 2.0 2.0 score BAYES_95 0.0001 0.0001 3.0 3.0 score BAYES_99 0.0001 0.0001 3.5 3.5 The scores march upwards just as expected. And it looks like the 50-99 scores have been set by hand rather than the perceptron. -- Bowie