A couple of days ago I committed into James svn trunk two new spam detection mailets plus helper classes implementing bayesian analysis techniques: BayesianAnalysis and BayesianAnalysisFeeder.

Details can be found in http://wiki.apache.org/james/Bayesian_Analysis.

This code (coming originally from Chris Means) is a rewrite/cleanup of mailets that I've been using very effectively in production for the last 24 months. No change (other than cleanup) was made to the core routines.

For those that have been using (and trained their own "corpus") my private version (that was available at http://portale.praxis.it/pub/james/james-praxis.zip), this new version is completely backward compatible: there are only changes to the config.xml entries, to the "corpus rebuild" process (now automatic every 10 minutes) and a database field name correction from "occurances" to "occurrences".

If anybody wants, I can send a MySql dump (done using mysqldump) of my production corpus (two years of training - currently 2211 spam and 388 ham messages fed, 4.2 MB zip file). If you use my data, please do your ham training before blocking messages. Just let me know sending me a private message.

Vincenzo

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to