A New Approach: Find the Ham

Dan Sat, 10 Feb 2007 11:44:01 -0800

I've developed a new approach to scoring that I want to 1) share witheveryone and 2) make into a working system thats as accurate as whatI've already built, but easier to use. First, the theory:



SITUATION

In the beginning, all email was ham. When spam came along, we leftthe ham alone and targeted the annoyance (spam).


ASSUMPTION
All messages are ham unless x,y,z score says they're spam.

APPROACH

Block nothing, then create rules to catch what you don't want. ie,build tests that target the spam, then score the millions of waysspam can occur.


RESULT

Huge time spent tuning and retuning weights, catching everything insight (including much ham).




NEW SITUATION
Ham is now the tiniest minority of all email.

NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH

Block everything, then create rules to not catch what you do want.ie, build tests that target the spam (keeping all the tests you'vealready built), then score the thousands of ways ham triggers onthose tests.


NEW RESULT

Spend less time and energy while catching more of what you do wantand less of what you don't.




CHALLENGE

All filtering software is written to score for results that equalspam -> catch the bad


SOLUTION

Make filtering software score for results that equal ham -> uncatchthe good.



Your thoughts?

Dan


BTW, is there a better forum for this level of question?

A New Approach: Find the Ham

Reply via email to