Re: [Spambayes] Images of commercial text with decoy text are mushing my index

skip Mon, 01 Jan 2007 07:01:36 -0800

    Jamie> since the decoy text is completely non-commercial in nature, it
    Jamie> seems to be polluting my index and making detection less
    Jamie> accurate.  With OCR, will this continue to be an issue?


Sure, if the decoy text actually turns out to be relevant from a scoring
standpoint.  By default the SpamBayes classifier only considers tokens
(words) which score <= 0.4 or >= 0.6.  My guess is that most of the words in
the decoy text are clustered around 0.5 so aren't even considered.

Skip
_______________________________________________
[email protected]
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html

Re: [Spambayes] Images of commercial text with decoy text are mushing my index

Reply via email to