On 08/22/16 07:40, Antony Stone wrote:
On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:

On 08/22/16 07:28, Dianne Skoll wrote:

What percentage of emails using your algorithm are actually
decidable?
Almost 100% if you look at a wide variety of tokens from multiple
attributes. Subject, body, content flags, header structure, combinations
of all domains reference, php scripts, name part of from addresses,
behavior flags.
I would have said that a very large number of the words used in spam mails are
the same as the words used in ham mails, so I suspect I'm confused about what
constitutes a "token".

The ones that are the same are of no interest. Only where it matches one side and not the other.


I fail to see how the "name part of from addresses" are unlikely to match ham,
for example, since I see quite a lot of spam apparently from myself.


Antony.


Some spammers have Viagra in the name part. The name part is very spammy. I also store to and from email addresses so that relationships between people corresponding create a ham result. (I filter outbound as well for some people)

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Reply via email to