David Abrahams wrote on Tuesday, February 06, 2007 2:10 PM -0600: > I understand from the above that subject words are considered, but it > still seems to me that something must be wrong.
<...> > Word # Spam # Ham Probability > spam. 14 12 0.506679 You can see why this clue does not affect message classification: 14 spam were trained with this token and 12 ham, and the spam probability of the token is 0.51. > subject:spam 13 0 0.983271 OTOH, the token spam in a message subject is a strong spam clue, as you've trained 13 such messages as spam and none as ham. > which tells me that the tokenizer may be throwing out the brackets. > OK, I see that it's doing so on both ends (when training and when > classifying) so it's okay. The tokenizer does throw out the brackets, but it still shows the word inside the brackets as a token. I am guessing that it does not use the token when you've told Spambayes to notate your subject line with that word. Any chance that's the case? Assuming it's not that simple, send the set of spam clues for a message with [spam] in the subject. -- Seth Goodman _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
