Jamie> since the decoy text is completely non-commercial in nature, it
Jamie> seems to be polluting my index and making detection less
Jamie> accurate. With OCR, will this continue to be an issue?
Sure, if the decoy text actually turns out to be relevant from a scoring
standpoint. By default the SpamBayes classifier only considers tokens
(words) which score <= 0.4 or >= 0.6. My guess is that most of the words in
the decoy text are clustered around 0.5 so aren't even considered.
Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html