http://bugzilla.spamassassin.org/show_bug.cgi?id=2910





------- Additional Comments From [EMAIL PROTECTED]  2004-01-12 12:13 -------
Subject: RE:  Fast SpamAssassin score learning tool.

Na�ve Bayes will not work very well with the rules because there is far
too much mutual information in the attributes.  The reason why the
neural network performs so well (with either training algorithm) is that
it is able to learn lower weights for groups of rules that frequently
co-occur.

If you'd like to learn more about machine learning, I would suggest
taking a look at Data Mining by Witten and Frank or Machine Learning by
Mitchell (the latter is much more technical).  Both cover all of the
common learning algorithms (neural networks, rule-based learning,
decision trees, Bayesian networks, support vector machines, etc.).
Witten and Frank's book focuses on running experiments using the "Weka"
tool, an open source machine learning toolkit.  Like most of the tools
that I've come across, it has a hard time dealing with large datasets,
so your mileage may vary.

Henry

------- Additional Comments From [EMAIL PROTECTED]  2004-01-12 11:55
------- Let me see if I understand what Phil is suggesting: The idea to
add to the Bayes db a magic token for each rule that matches, along with
the other Bayes db information, and then use only Bayes for the scoring.
This should be easy to test with a 10-fold cross validation combining
the output of mass check with construction of the Bayes db.

Is anybody up to running the test?





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to