http://bugzilla.spamassassin.org/show_bug.cgi?id=2910
------- Additional Comments From [EMAIL PROTECTED] 2004-01-22 14:25 ------- Created an attachment (id=1723) --> (http://bugzilla.spamassassin.org/attachment.cgi?id=1723&action=view) example comparison between GA-generated and perceptron-generated scores Out of curiosity, I wroted up a simple perl script to comapre the scores in one score file to another (see previously attached program, and example comparison attached to this message). Here's an example of the output: SUBJ_REMOVE 0.001 1.887 188600.0 SUB_HELLO 0.001 1.782 178100.0 MARKET_SOLUTION 0.001 0.661 66000.0 HTML_FONTCOLOR_NAME 0.001 0.506 50500.0 UP_TO_OR_MORES 0.001 0.201 20000.0 FROM_HAS_ULINE_NUMS 0.001 0.104 10300.0 SAVE_MONEY 0.001 0.098 9700.0 RECEIVED_CACHEFLOW 0.001 0.057 5600.0 HTML_TAG_EXISTS_PARAM 0.004 0.206 5050.0 LARGE_HEX 0.001 0.042 4100.0 MSGID_THREESIXSIX 0.001 0.039 3800.0 BE_AMAZED 0.001 0.038 3700.0 HTML_COMMENT_8BITS 0.001 0.035 3400.0 DATE_MISSING 0.001 0.034 3300.0 The second column is the score from file1, in this case the GA scoring. The third column is the perceptron scoring, and the last column is the percentage change from the score in the second column to the score in the third column. The list is sorted in descending order by the absolute value of the percentage change. Maybe just the simple difference would be more illustrative. Conclusion: the perceptron model and the GA model calculate wildly different scores but achieve similar accuracy (per JM's note). Got a question. In the perceptron score files, there seems to be an extra column for scores that is always zero. What's the purpose of that column? Here's an example: score ACCEPT_CREDIT_CARDS 0 0.002 ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
