On Saturday, March 5, 2005, 11:24:25 AM, Eric Hall wrote: > On 3/4/2005 1:57 PM, Rob McEwen (PowerView Systems) wrote: >> Quinlan: Any technique that tries to identify "good" mail without >> authentication backing it up, or some form of personalized training. It >> worked well for a while, but it's definitely not an effective technique >> today.
> I kind of disagree with this, but only partly. > Generally speaking, you want as many "good" indicators as you have "bad" > indicators. If you have hundreds of indicators that flag every possible > spam-sign, then sooner or later every piece of good mail will also get > flagged by one rule or another. In order to off-set this, you want to have > a collection of "good" indicators, so that you can cancel out the > "everything-looks-like-spam" effect. Unfortunately, these rules will also > hit some kind of spam, so sooner or later a large enough set of good rules > will just make everything some shade of grey, or worse will make marginal > spam appear to be good. > Now then, in order to avoid that, you really should limit the positive > indicators to stuff that you can verify (which is only slightly different > than "authenticate"). All the rules are verified by testing against spam and ham corpora before being deployed. Ones that have high false positives are given a low score or not used at all. Folks don't just make up rules and deploy them. The usefulness of the "official" rules is checked before they're released. YMMV on homemade rules. That said, as the Internet moves towards more useable identification and authentication schemes for mail, they will probably get "positive" rules in SA. SPF or Domain Keys may (or may not) be examples, but the nice thing is that SA lets us give them "relative goodness scores" and not an outright pass or fail, so they don't need to be perfect out of the box. That may actually help their adoption as it arguably has with SURBLs. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/