Matt Kettler wrote:
> 
> It is perfectly reasonable to assume that most of the mail matching
> BAYES_99 also matches a large number of the stock spam rules that SA
> comes with. These highly-obvious mails are the model after which
> most SA rules are made in the first place. Thus, these mails need
> less score boost, as they already have a lot of score from other
> rules in the ruleset. 
> 
> However, mails matching BAYES_95 are more likely to be "trickier",
> and are likely to match fewer other rules. These messages are more
> likely to require an extra boost from BAYES_95's score than those
> which match BAYES_99.

I can't argue with this description, but I don't agree with the
conclusion on the scores.

The Bayes rules are not individual unrelated rules.  Bayes is a series
of rules indicating a range of probability that a message is spam or
ham.  You can argue over the exact scoring, but I can't see any reason
to score BAYES_99 lower than BAYES_95.  Since a BAYES_99 message is
even more likely to be spam than a BAYES_95 message, it should have at
least a slightly higher score.  It is obvious that a BAYES_99 message
is more likely to hit other rules and therefore be less reliant on a
score increase from Bayes, but this is no reason to drop the score.

I generally don't look into the rule scoring too much unless I run
into a problem, but I thought this had been fixed in the latest
couple of versions anyway.  Looking at my score file, I find this:

score BAYES_00 0.0001 0.0001 -2.312 -2.599
score BAYES_05 0.0001 0.0001 -1.110 -1.110
score BAYES_20 0.0001 0.0001 -0.740 -0.740
score BAYES_40 0.0001 0.0001 -0.185 -0.185
score BAYES_50 0.0001 0.0001 0.001 0.001
score BAYES_60 0.0001 0.0001 1.0 1.0
score BAYES_80 0.0001 0.0001 2.0 2.0
score BAYES_95 0.0001 0.0001 3.0 3.0
score BAYES_99 0.0001 0.0001 3.5 3.5

The scores march upwards just as expected.  And it looks like the
50-99 scores have been set by hand rather than the perceptron.

-- 
Bowie

Reply via email to