http://bugzilla.spamassassin.org/show_bug.cgi?id=2910
------- Additional Comments From [EMAIL PROTECTED] 2004-01-22 18:58 ------- duh, I'm an idiot -- those accuracy figures are the accuracy for the score-generator itself, against its own training corpus, the "training fold" -- NOT the accuracy measured against the "test fold". Here's the *correct* results for the 10-fold CV -- namely those score files, measured against the test fold (the .log files). evolve (ie. the GA), default settings: # TCR: 8.607287 SpamRecall: 96.143% SpamPrec: 98.411% FP: 0.78% FN: 1.94% # TCR: 9.663636 SpamRecall: 96.472% SpamPrec: 98.606% FP: 0.69% FN: 1.77% # TCR: 10.630000 SpamRecall: 96.002% SpamPrec: 98.886% FP: 0.54% FN: 2.01% # TCR: 8.748971 SpamRecall: 95.861% SpamPrec: 98.502% FP: 0.73% FN: 2.08% # TCR: 9.533632 SpamRecall: 95.861% SpamPrec: 98.692% FP: 0.64% FN: 2.08% # TCR: 9.981221 SpamRecall: 96.802% SpamPrec: 98.610% FP: 0.69% FN: 1.61% # TCR: 9.365639 SpamRecall: 95.437% SpamPrec: 98.735% FP: 0.61% FN: 2.29% # TCR: 10.221154 SpamRecall: 95.155% SpamPrec: 98.973% FP: 0.50% FN: 2.44% # TCR: 9.883721 SpamRecall: 96.706% SpamPrec: 98.608% FP: 0.69% FN: 1.66% # TCR: 10.678392 SpamRecall: 96.518% SpamPrec: 98.796% FP: 0.59% FN: 1.75% not great -- note the TCRs wandering about. perceptron. I took Henry's advice and tweaked the parameters a little to see what effect that would have. -p 0.75 -e 100 seems to be closest to the FP/FN ratio used by the GA above: perceptron -p 0.75 -e 100 # TCR: 10.320388 SpamRecall: 96.425% SpamPrec: 98.748% FP: 0.61% FN: 1.80% # TCR: 12.079545 SpamRecall: 96.896% SpamPrec: 98.943% FP: 0.52% FN: 1.56% # TCR: 14.561644 SpamRecall: 96.425% SpamPrec: 99.322% FP: 0.33% FN: 1.80% # TCR: 10.737374 SpamRecall: 96.331% SpamPrec: 98.842% FP: 0.57% FN: 1.84% # TCR: 10.791878 SpamRecall: 95.908% SpamPrec: 98.933% FP: 0.52% FN: 2.06% # TCR: 13.042945 SpamRecall: 95.861% SpamPrec: 99.269% FP: 0.35% FN: 2.08% # TCR: 11.368984 SpamRecall: 95.908% SpamPrec: 99.029% FP: 0.47% FN: 2.06% # TCR: 12.360465 SpamRecall: 95.437% SpamPrec: 99.266% FP: 0.35% FN: 2.29% # TCR: 14.072848 SpamRecall: 97.129% SpamPrec: 99.135% FP: 0.43% FN: 1.44% # TCR: 14.072848 SpamRecall: 96.659% SpamPrec: 99.227% FP: 0.38% FN: 1.68% And in terms of "good numbers" -- ie my taste ;) -- here's what seems nice: perceptron -p 2.0 -e 100 # TCR: 13.805195 SpamRecall: 94.873% SpamPrec: 99.556% FP: 0.21% FN: 2.58% # TCR: 12.148571 SpamRecall: 96.002% SpamPrec: 99.126% FP: 0.43% FN: 2.01% # TCR: 14.867133 SpamRecall: 95.390% SpamPrec: 99.558% FP: 0.21% FN: 2.32% # TCR: 11.491892 SpamRecall: 94.826% SpamPrec: 99.261% FP: 0.35% FN: 2.60% # TCR: 12.730539 SpamRecall: 95.202% SpamPrec: 99.362% FP: 0.31% FN: 2.41% # TCR: 13.123457 SpamRecall: 94.967% SpamPrec: 99.458% FP: 0.26% FN: 2.53% # TCR: 11.491892 SpamRecall: 95.296% SpamPrec: 99.168% FP: 0.40% FN: 2.37% # TCR: 11.072917 SpamRecall: 93.321% SpamPrec: 99.498% FP: 0.24% FN: 3.36% # TCR: 13.888889 SpamRecall: 96.094% SpamPrec: 99.319% FP: 0.33% FN: 1.96% # TCR: 13.798701 SpamRecall: 95.576% SpamPrec: 99.413% FP: 0.28% FN: 2.22% Note that the perceptron's TCRs and FP/FN ratios are consistently higher -- and more stable -- than the GA's. Stability is very important, because unstable scores means that the tool over-fitted to the data it had available, and generated "wierd" scores that didn't match the accuracy of the rule, but made the results on its training set better. It's clear the perceptron is better in this respect, which is the point of the test. So, looks good! PS: I dropped the mystery extra column. I haven't a clue what that was for. ;) PPS: I should point out that this is all with scoreset 0; I doubt redoing using set1, set2 or set3 would really make any difference, though, as we're just comparing the score-discovery algorithms, not the ruleset. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
