On Mon, 22 Aug 2016, Antony Stone wrote:

On Monday 22 August 2016 at 16:45:09, Dianne Skoll wrote:

On Mon, 22 Aug 2016 07:34:00 -0700 Marc Perkel wrote:
So.  What percentage of emails using your algorithm are actually
decidable?

Almost 100% if you look at a wide variety of tokens from multiple
attributes.

I can't believe that, or I'm missing something.  Almost every spam I see
contains words that also appear in ham.  Things like "this" or "invoice"
or "regards" or "dear".

What am I missing?

I believe you're missing Marc's definition of "token".

...and it looks like we're venturing into the "SA Bayes multiple-word token support" realm (as a surrogate).

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #6: If you can choose what to bring to a
  gunfight, bring a long gun and a friend with a long gun.
-----------------------------------------------------------------------
 2 days until the 1937th anniversary of the destruction of Pompeii

Reply via email to