Tony Meyer <[EMAIL PROTECTED]> wrote: > As long as we stay away from actually expressing it as a "batting average" > and avoid baseball terminology. Those numbers are just completely confusing > to me, and I suspect most non-Americans.
I agree. I think sticking with percentages is the way to go. I suppose it's a reasonable analogy once it's explained, but it isn't going to be obvious even to a baseball fanatic just from looking at the numbers. The main thing I took from the proposal was expressing accuracy separately for ham and spam instead of taking each as a percentage of the total messages. Even if 99% of messages have been classified correctly, it makes a big difference whether the remaining 1% represents spam that made it through to the inbox vs. ham that was removed from the inbox by mistake. I'm still a little unsure (ok, pun intended, couldn't resist <wink>) how to treat unsures in this. Currently I'm showing the primary accuracy results based on the number of messages that SpamBayes classified as either ham or spam, with a separate percentage showing the additional messages that were classified as unsure. Another option I considered was measuring the percentage of messages removed from the inbox. It seems that ham and spam are somewhat asymmetric with regards to unsures. I suspect that most people are ok with spam being classified as unsure as long as it isn't left in their inbox, but they would prefer not to see a ham message removed from the inbox even if it is only moved to the unsure bin. Any thoughts/suggestions/preferences? -- Kenny Pitt _______________________________________________ spambayes-dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/spambayes-dev
