On Fri, 11 Dec 2015 03:31:56 +0100 Benny Pedersen <m...@junc.eu> wrote:
> if z is scored as spam, and x and y is ham, then its ham basicly > that how bayes works, but a single mail might be lots of digest to > compare for this to say spam or not The thing is, the probability of token Y is not independent of the previous token, and single-token Bayes misses out on those conditional probabilities. The example I like to give is that the tokens "red" and "hot" are probably neutral to slightly spammy, and "sex" is probably mildly spammy, but "red hot sex" is way spammier than the individual tokens far apart as in "Yeah, the red chili peppers are hot. Oh, by the way, what was the sex of the baby?" Regards, Dianne.