On Monday 22 August 2016 at 16:34:00, Marc Perkel wrote:
> On 08/22/16 07:28, Dianne Skoll wrote:
>
> > What percentage of emails using your algorithm are actually
> > decidable?
>
> Almost 100% if you look at a wide variety of tokens from multiple
> attributes. Subject, body, content flags, header structure, combinations
> of all domains reference, php scripts, name part of from addresses,
> behavior flags.
I would have said that a very large number of the words used in spam mails are
the same as the words used in ham mails, so I suspect I'm confused about what
constitutes a "token".
I fail to see how the "name part of from addresses" are unlikely to match ham,
for example, since I see quite a lot of spam apparently from myself.
Antony.
--
Never automate fully anything that does not have a manual override capability.
Never design anything that cannot work under degraded conditions in emergency.
Please reply to the list;
please *don't* CC me.