Re: Matching infinite sets

Marc Perkel Mon, 22 Aug 2016 09:04:16 -0700


On 08/22/16 07:40, Antony Stone wrote:

On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:

On 08/22/16 07:28, Dianne Skoll wrote:

What percentage of emails using your algorithm are actually
decidable?

Almost 100% if you look at a wide variety of tokens from multiple
attributes. Subject, body, content flags, header structure, combinations
of all domains reference, php scripts, name part of from addresses,
behavior flags.

I would have said that a very large number of the words used in spam mails are
the same as the words used in ham mails, so I suspect I'm confused about what
constitutes a "token".

The ones that are the same are of no interest. Only where it matches oneside and not the other.


I fail to see how the "name part of from addresses" are unlikely to match ham,
for example, since I see quite a lot of spam apparently from myself.


Antony.

Some spammers have Viagra in the name part. The name part is veryspammy. I also store to and from email addresses so that relationshipsbetween people corresponding create a ham result. (I filter outbound aswell for some people)


--
Marc Perkel - Sales/Support
[email protected]
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Re: Matching infinite sets

Reply via email to