At 07:08 PM 3/5/2005, Nigel Wilkinson wrote:

Why does a 99-100% probability score less than an 80-95% probability???


This is more-or-less a FAQ in SA now.

Rule scores in SA are not in any way linear.

The scores are not assigned based on performance, they're based on tuning the scores of ALL of the rules together in such a way to minimize the total of FP's and FN's with a 1:100 ratio (i.e. find the lowest FP +100*FN).

Because of this, rule scores are not assigned based on the performance of one individual rule, but it's interactions with every other rule in the ruleset.

In the case of BAYES_99, it would appear that most spam messages that hit it also hit a lot of other rules, thus SA's score optimize could sacrifice the score slightly to reduce the FPs without introducing a significant number of FN's. However, the story may be different in BAYES_80.. here the spams are likely to be more evasive, and might need a higher score from this rule to avoid large numbers of FNs.

The other off-chance possibility is there may be some mis-placed spams in the corpus the dev's used. Actualy, there's almost certainly one or two in the lot, but if there's a decent number of them they can really screw up the scores.




Reply via email to