http://bugzilla.spamassassin.org/show_bug.cgi?id=2910





------- Additional Comments From [EMAIL PROTECTED]  2004-01-21 23:58 -------
Created an attachment (id=1717)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=1717&action=view)
10fcv score files (tar.gz)

OK, here's the results from the ten-fold cross-validation run.

It's a *little* off, but nothing that isn't tweakable I wouldn't say... looks
like a minor issue of balancing FPs vs FNs.  Basically the FPs are low at the
expense of a few % points of FNs.

Attachment is a tar.gz containing the scores files for the GA and perceptron
parts of the test.  I'll upload a tar.gz with the ham.log and spam.log files as
well in a minute.  If someone wants, they could take it, tweak the perceptron
code to rebalance that FP/FN ratio a bit, and rerun the perceptron half...

I documented the procedure at
http://wiki.spamassassin.org/w/TenFoldCrossValidation .


ga/scores.1 False positives:        36  0.09%  (0.19% nonspam)
ga/scores.1 False negatives:       846  2.22%  (4.42% spam)
ga/scores.2 False positives:        40  0.11%  (0.21% nonspam)
ga/scores.2 False negatives:       827  2.17%  (4.32% spam)
ga/scores.3 False positives:        44  0.12%  (0.23% nonspam)
ga/scores.3 False negatives:       856  2.25%  (4.47% spam)
ga/scores.4 False positives:        39  0.10%  (0.21% nonspam)
ga/scores.4 False negatives:       865  2.27%  (4.52% spam)
ga/scores.5 False positives:        41  0.11%  (0.22% nonspam)
ga/scores.5 False negatives:       813  2.14%  (4.25% spam)
ga/scores.6 False positives:        40  0.11%  (0.21% nonspam)
ga/scores.6 False negatives:       869  2.28%  (4.54% spam)
ga/scores.7 False positives:        36  0.09%  (0.19% nonspam)
ga/scores.7 False negatives:       825  2.17%  (4.31% spam)
ga/scores.8 False positives:        44  0.12%  (0.23% nonspam)
ga/scores.8 False negatives:       817  2.15%  (4.27% spam)
ga/scores.9 False positives:        39  0.10%  (0.21% nonspam)
ga/scores.9 False negatives:       873  2.29%  (4.56% spam)
ga/scores.10 False positives:        44  0.12%  (0.23% nonspam)
ga/scores.10 False negatives:       862  2.27%  (4.51% spam)

perceptron/scores.1 False positives:         9  0.02%  (0.05% nonspam)
perceptron/scores.1 False negatives:      1142  3.00%  (5.97% spam)
perceptron/scores.2 False positives:        17  0.04%  (0.09% nonspam)
perceptron/scores.2 False negatives:      1044  2.74%  (5.46% spam)
perceptron/scores.3 False positives:        22  0.06%  (0.12% nonspam)
perceptron/scores.3 False negatives:      1067  2.80%  (5.58% spam)
perceptron/scores.4 False positives:        22  0.06%  (0.12% nonspam)
perceptron/scores.4 False negatives:      1137  2.99%  (5.94% spam)
perceptron/scores.5 False positives:        11  0.03%  (0.06% nonspam)
perceptron/scores.5 False negatives:      1136  2.99%  (5.94% spam)
perceptron/scores.6 False positives:        21  0.06%  (0.11% nonspam)
perceptron/scores.6 False negatives:       991  2.60%  (5.18% spam)
perceptron/scores.7 False positives:        16  0.04%  (0.08% nonspam)
perceptron/scores.7 False negatives:      1102  2.90%  (5.76% spam)
perceptron/scores.8 False positives:        22  0.06%  (0.12% nonspam)
perceptron/scores.8 False negatives:       980  2.58%  (5.12% spam)
perceptron/scores.9 False positives:        24  0.06%  (0.13% nonspam)
perceptron/scores.9 False negatives:       972  2.55%  (5.08% spam)
perceptron/scores.10 False positives:        22  0.06%  (0.12% nonspam)
perceptron/scores.10 False negatives:       975  2.56%  (5.10% spam)




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to