http://bugzilla.spamassassin.org/show_bug.cgi?id=2910
------- Additional Comments From [EMAIL PROTECTED] 2004-01-12 12:13 ------- Subject: RE: Fast SpamAssassin score learning tool. Na�ve Bayes will not work very well with the rules because there is far too much mutual information in the attributes. The reason why the neural network performs so well (with either training algorithm) is that it is able to learn lower weights for groups of rules that frequently co-occur. If you'd like to learn more about machine learning, I would suggest taking a look at Data Mining by Witten and Frank or Machine Learning by Mitchell (the latter is much more technical). Both cover all of the common learning algorithms (neural networks, rule-based learning, decision trees, Bayesian networks, support vector machines, etc.). Witten and Frank's book focuses on running experiments using the "Weka" tool, an open source machine learning toolkit. Like most of the tools that I've come across, it has a hard time dealing with large datasets, so your mileage may vary. Henry ------- Additional Comments From [EMAIL PROTECTED] 2004-01-12 11:55 ------- Let me see if I understand what Phil is suggesting: The idea to add to the Bayes db a magic token for each rule that matches, along with the other Bayes db information, and then use only Bayes for the scoring. This should be easy to test with a 10-fold cross validation combining the output of mass check with construction of the Bayes db. Is anybody up to running the test? ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
