OK - Trying to make the really simple. Just talking about concept now.

Let's say I get an email where the subject is "I have aednocarsonoma of the lung".

Right off you know it's ham because spammers never use the word "aednocarsonoma" and normal people do. Spammer also never use:

"of the lung"
"the lung"
"aednocarsonoma of"
....

So - tell me you follow this so far. Spammers don't spam about aednocarsonoma.

In this case I'm identifying ham because in some previous email people were talking about lung cancer and those phrases were learned as ham. But what makes it really ham is not just that it matches previous ham, but it doesn't match previous spam.

A word like Viagra for example would produce no score because it is in both sets. However "cheapest viagra online" would match spam and not match ham indicating it's spam.

The magic here is that this detects both spam and ham. And it is especially good at detecting ham, which greatly reduces false positives.

Reply via email to