Mark Martinec writes:
> > Also, any suggestions from outside the dev team?  Anyone got good ideas
> > for new SpamAssassin features that would be good to pay someone to work on
> > for 3 months?
> 
> I believe this was once mentioned on a Justin's blog (but can't find
> a ref now), the following sounds promising as an additional classifier
> to existing bayes (especially since the author comes from the same
> organization as myself :)
> 
> http://www.virusbtn.com/spambulletin/archive/2006/01/sb200601-trec
> 
>   ijsSPAM2    PPM-D compression model
>     Andrej Bratko (Josef Stefan Institute)
> 
>   Observations:
>   The most startling observation is that character-based compression models
>   perform outstandingly well for spam filtering. Commonly used open-source
>   filters perform well, but not nearly so well or nearly so poorly as
>   reported elsewhere.

Yes, definitely!  A related algorithm is OSBF, as implemented here:
http://osbf-lua.luaforge.net/ This had the best performance for
hand-trained probabilistic classifiers in the TREC Spam Track 2006
evaluation -- that's good ;)

Also, a related project would be to complete the pluginization of our
"Bayes" engine and APIs, so that other probabilistic classifiers can be
plugged in in place of, or in addition to, Bayes in SpamAssassin.

--j.

Reply via email to