Re: Matching infinite sets

Marc Perkel Mon, 22 Aug 2016 07:34:24 -0700


On 08/22/16 07:28, Dianne Skoll wrote:

On Mon, 22 Aug 2016 07:16:41 -0700
Marc Perkel <[email protected]> wrote:

Anthony, Yes - I don't store Set B. I store Set A. B is defined by
what's NOT in A. So I test A and if it's not matched it's set B. Set
B is just a negative match on A.

Let me ask you a question.  As far as I understand your algorithm, if
an email contains at least one token in the "ham" set and zero tokens in
the "spam" set, you classify it as ham.  And conversely, if it contains
at least one spam token but zero ham tokens, you classify it as spam.


YES! YES! YES!

Although I look at some thousand "fingerprints" to get a moresignificant result.


The other two possibilities (no tokens in either or some tokens in both)
are undecidable.


Exactly!


So.  What percentage of emails using your algorithm are actually decidable?

Almost 100% if you look at a wide variety of tokens from multipleattributes. Subject, body, content flags, header structure, combinationsof all domains reference, php scripts, name part of from addresses,behavior flags.


Regards,

Dianne.


--
Marc Perkel - Sales/Support
[email protected]
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Reply via email to