I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory:


SITUATION
In the beginning, all email was ham. When spam came along, we left the ham alone and targeted the annoyance (spam).

ASSUMPTION
All messages are ham unless x,y,z score says they're spam.

APPROACH
Block nothing, then create rules to catch what you don't want. ie, build tests that target the spam, then score the millions of ways spam can occur.

RESULT
Huge time spent tuning and retuning weights, catching everything in sight (including much ham).



NEW SITUATION
Ham is now the tiniest minority of all email.

NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you do want. ie, build tests that target the spam (keeping all the tests you've already built), then score the thousands of ways ham triggers on those tests.

NEW RESULT
Spend less time and energy while catching more of what you do want and less of what you don't.



CHALLENGE
All filtering software is written to score for results that equal spam -> catch the bad

SOLUTION
Make filtering software score for results that equal ham -> uncatch the good.


Your thoughts?

Dan


BTW, is there a better forum for this level of question?

Reply via email to