On Mon, 6 Nov 2006, John D. Hardin wrote: > The default scores are generated by analyzing their performance > against hand-categorized corpa of actual emails. If a rule hits spam > often and ham rarely, it will be given a higher score than one that > hits spam often and ham occasionally.
That sounds very Bayesian ... with Bayesian rules already doing that sort of logic, I would hope there is more human thinking put into score setting. The bayes rules are very shiny and effective, but they are supposed to assist the hand-drawn filters rather than have the filters assist the bayes rules. ... if that's the current SA thinking, I'll have to re-consider CRM114 and other "better-than-bayes" systems. > Rule performance against real-world traffic can be counterintuitive, > and the rules' relation to each other isn't necessarily a part of the > analysis. That's where the human tweaking is supposed to happen; if gobs of spam flag the 80% meter of some test while no ham does, and the 90% meter is almost never hit by anything, it should have a higher value than the 80% meter does. If the 90% meter has more ham than spam despite the 80% meter having more spam than ham, the tests need to be closely looked at rather than inappropriately weighted. just my two cents, anyway -Adam Katz