On 08/22/16 07:28, Dianne Skoll wrote:
On Mon, 22 Aug 2016 07:16:41 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:

Anthony, Yes - I don't store Set B. I store Set A. B is defined by
what's NOT in A. So I test A and if it's not matched it's set B. Set
B is just a negative match on A.
Let me ask you a question.  As far as I understand your algorithm, if
an email contains at least one token in the "ham" set and zero tokens in
the "spam" set, you classify it as ham.  And conversely, if it contains
at least one spam token but zero ham tokens, you classify it as spam.

YES! YES! YES!

Although I look at some thousand "fingerprints" to get a more significant result.


The other two possibilities (no tokens in either or some tokens in both)
are undecidable.

Exactly!


So.  What percentage of emails using your algorithm are actually decidable?

Almost 100% if you look at a wide variety of tokens from multiple attributes. Subject, body, content flags, header structure, combinations of all domains reference, php scripts, name part of from addresses, behavior flags.


Regards,

Dianne.




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Reply via email to