I've developed a new approach to scoring that I want to 1) share with
everyone and 2) make into a working system thats as accurate as what
I've already built, but easier to use. First, the theory:
SITUATION
In the beginning, all email was ham. When spam came along, we left
the ham alone and targeted the annoyance (spam).
ASSUMPTION
All messages are ham unless x,y,z score says they're spam.
APPROACH
Block nothing, then create rules to catch what you don't want. ie,
build tests that target the spam, then score the millions of ways
spam can occur.
RESULT
Huge time spent tuning and retuning weights, catching everything in
sight (including much ham).
NEW SITUATION
Ham is now the tiniest minority of all email.
NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.
NEW APPROACH
Block everything, then create rules to not catch what you do want.
ie, build tests that target the spam (keeping all the tests you've
already built), then score the thousands of ways ham triggers on
those tests.
NEW RESULT
Spend less time and energy while catching more of what you do want
and less of what you don't.
CHALLENGE
All filtering software is written to score for results that equal
spam -> catch the bad
SOLUTION
Make filtering software score for results that equal ham -> uncatch
the good.
Your thoughts?
Dan
BTW, is there a better forum for this level of question?