On Thursday 12 October 2006 14:54, John Rudd wrote:
> That rule has a 3.2 value because the 3.2 value is
> accurate to differentiating spam vs ham in the corpus.  Therefore, the
> score is appropriate.

No, its not accurate.

The rule is in-discriminant as to content.  It flags ham with the same score
as spam.  Therefore by definition it is in-discriminant, and thus useless
as in the prediction of ham vs spam.

Zero that rule's score, and your false positives will fall, but your false 
negatives will not increase.  The rule unfairly targets ham.

I notice a great influx of spam on monday.  Should all mail sent on monday
regardless of content be scored higher?

Post hoc, ergo propter hoc? Is that what passes for analysis?

-- 
_____________________________________
John Andersen

Reply via email to