summary to what you said below:

that's what bayes already does
just rain it properly instead re-invent the whell

Am 11.12.2015 um 18:05 schrieb Marc Perkel:
What I was thinking about doing was creating a string of tokens that
represented key features of the message. Then run that through a program
that created new tokens out of every possible combination of 2 tokens
and adding that to the string. Then running bayes on that. My tokens
will not be the text of the message but rules hit including a lot of
rules I create not for points but just for tokens.

For example. I create rules that look for many phrases about a subject
and the subject becomes a token. For examples:

JESUS
ROYALTY
MONEY

But themselves not an indicator of spam. But if you have all 3 then it's
definitely spam. The idea is to not look at words but look at the
meaning of phrases. For instance, introductions:

Dear (friend)
I am (someone)
I am contacting you because (some reason)

This says - I don't know you.

I am a member of the (Nigerian royal family|Armed forces in Iraq) etc.

These can all be reduced to tokens and then you just look for
combination of tokens


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to