On Mon, 6 Nov 2006, John D. Hardin wrote:
> The default scores are generated by analyzing their performance
> against hand-categorized corpa of actual emails. If a rule hits spam
> often and ham rarely, it will be given a higher score than one that
> hits spam often and ham occasionally.

That sounds very Bayesian ... with Bayesian rules already doing that sort
of logic, I would hope there is more human thinking put into score
setting.  The bayes rules are very shiny and effective, but they are
supposed to assist the hand-drawn filters rather than have the filters
assist the bayes rules.  ... if that's the current SA thinking, I'll have
to re-consider CRM114 and other "better-than-bayes" systems.

> Rule performance against real-world traffic can be counterintuitive,
> and the rules' relation to each other isn't necessarily a part of the
> analysis.

That's where the human tweaking is supposed to happen; if gobs of spam
flag the 80% meter of some test while no ham does, and the 90% meter is
almost never hit by anything, it should have a higher value than the 80%
meter does.  If the 90% meter has more ham than spam despite the 80% meter
having more spam than ham, the tests need to be closely looked at rather
than inappropriately weighted.

just my two cents, anyway

-Adam Katz

Reply via email to