summary to what you said below: that's what bayes already does just rain it properly instead re-invent the whell
Am 11.12.2015 um 18:05 schrieb Marc Perkel:
What I was thinking about doing was creating a string of tokens that represented key features of the message. Then run that through a program that created new tokens out of every possible combination of 2 tokens and adding that to the string. Then running bayes on that. My tokens will not be the text of the message but rules hit including a lot of rules I create not for points but just for tokens. For example. I create rules that look for many phrases about a subject and the subject becomes a token. For examples: JESUS ROYALTY MONEY But themselves not an indicator of spam. But if you have all 3 then it's definitely spam. The idea is to not look at words but look at the meaning of phrases. For instance, introductions: Dear (friend) I am (someone) I am contacting you because (some reason) This says - I don't know you. I am a member of the (Nigerian royal family|Armed forces in Iraq) etc. These can all be reduced to tokens and then you just look for combination of tokens
signature.asc
Description: OpenPGP digital signature