Re: [SAtalk] No rule for the word "V1AGRA"?

Matt Kettler Thu, 14 Aug 2003 15:37:37 -0700

At 08:28 AM 8/13/03 +0200, Øystein Gisnås wrote:

My Bayes is working properly, that's why it triggered on BAYES_90. The problem is that no BAYES rule is weighed so that it cross the 5.0 limit with the default settings. Of course, I can change this myself. But I thought the intention is that SA should work out of the box. I guess there have been discussions over the topic whether a single rule should be enough to mark a mail as spam. My opinion is that BAYES_90 and BAYES_99 should trigger, even with Razor++ enabled.

Yes, the intention is that SA should work out of the box.. and it does. However, it's tuned with certain assumptions about acceptable false positve and false negative rates. One of the primary assumptions is that false positives are 100 times worse than false negatives. SA has a slightly "soft hand" in scoring as a result. If your philosophy differs, lower the threshold, or tweak scores as you see fit.

As for what the score of bayes_90 should be, opinions are fine, but before you go wading in saying what the scores should be, you should realize how those scores come about. Realize that the rule scores are NOT hand assigned. They are tested and evolved against a real-world corpus of email. Unless you've got some solid facts to back yourself up, I'm sorry, but I'm going to have to side with the computerized testing and analysis of the rules against over 140,000 emails as being better than your gut feelings.

It's also fundamentally flawed to look only at one rule in the ruleset and try to figure out what it's score should be. The results of processing an email is an interaction between all the rules in the ruleset. Mails that trigger one rule, often trigger others at the same time. To see what the score should be, you need to study all the combinations of hits, not just the hits of one single rule at a time. If you want a well balanced score set, changing the score of one rule shifts the scores for almost every rule in the entire ruleset by the time your done correcting all the false positive and negative cases created by that change.

The complexity of these patterns is also why SA has it's scores assigned by a GA, not by some hand mechanism. (it's output is however re-tested and human inspected).

None of this is to say that the SA rule scores are infallible, but you do need to consider that this isn't a simple system, it's one with an extraordinary amount of complexity and inter-relationships. You need to think about the larger picture of the rules as a set in order to make reasonable judgements about scores.

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] No rule for the word "V1AGRA"?

Reply via email to