On Fri, 11 Dec 2015 03:31:56 +0100
Benny Pedersen <m...@junc.eu> wrote:

> if z is scored as spam, and x and y is ham, then its ham basicly
> that how bayes works, but a single mail might be lots of digest to
> compare for this to say spam or not

The thing is, the probability of token Y is not independent of the
previous token, and single-token Bayes misses out on those conditional
probabilities.

The example I like to give is that the tokens "red" and "hot" are
probably neutral to slightly spammy, and "sex" is probably mildly
spammy, but "red hot sex" is way spammier than the individual tokens
far apart as in "Yeah, the red chili peppers are hot.  Oh, by the way,
what was the sex of the baby?"

Regards,

Dianne.

Reply via email to