Chris St. Pierre wrote: > > > Mark Martinec wrote: >> >> >> ... the following sounds promising as an additional classifier >> to existing bayes (especially since the author comes from the same >> organization as myself :) >> >> http://www.virusbtn.com/spambulletin/archive/2006/01/sb200601-trec >> >> ijsSPAM2 PPM-D compression model >> Andrej Bratko (Josef Stefan Institute) >> >> Observations: >> The most startling observation is that character-based compression models >> perform outstandingly well for spam filtering. Commonly used open-source >> filters perform well, but not nearly so well or nearly so poorly as >> reported elsewhere. >> >> > > This looks very promising. I found a description of the ijsSPAM2 tool > on the site: > > http://www.virusbtn.com/spambulletin/archive/2006/03/sb200603-compression > > Remarkable stuff. That would be a helluva nice plugin to have. > >
I've recently released a C++ library that includes an implementation of the PPM-D algorithm, geared towards classification (or mail filtering). This is essentially the same algorithm that appeared at TREC 2005 as `ijsSPAM2'. It's available at: http://ai.ijs.si/andrej/psmslib.html There's also a Python wrapper: http://ai.ijs.si/andrej/psmpylib.html The C++ library and Python extension module are free for personal and for research use, but unfortunately, I cannot disclose the source code at this time, or release the libraries under an Apache-compatible license. Anyway, you might want to try it out before coding your own implementation. -- View this message in context: http://www.nabble.com/Google-Summer-of-Code-2007-...-tf3240085.html#a9146893 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.