Hi Pete, I've had James 2.3.1 running since forever on a small internet facing server.
I installed the Bayesian filter back when even one spam during the day was an event in itself! Now it protects us from hundreds a day! In fact looking at the logs it rejected 704 so far today. So I just wanted to say that it works very well. The Bayesian filter as supplied in James is very old and the theory behind them has improved a lot. The one in James doesn't attempt to decode the email. This means it analyzes base64 encoded messages without decoding the base64 text. Likewise it makes no attempt to ignore images or attachments and therefore it fills its corpus with a lot of random looking junk. However despite this it still manages to detect spam reasonably well provided you keep it trained and that means sending it ham as well as spam. With hindsight I shouldn't have set it up to delete emails that are thought likely to be spam. It keeps the database small sure... but it is a pain to restore an email that was wrongly deleted. In fact to restore a deleted email I have to go into MySQL's binary logs, search for that part of the log that inserted the email into the spool and then save it into a stand-alone file that can be read by an email reader. That's not something I like to do too often. In a future setup I think I'll move over to James' IMAP server and simply move suspected emails into a 'bad' list that the user can trawl through when they think something has been mis-classified. At present I've been experimenting with N-gram based bayesian filters as I think they hold a lot of promise. If I get something up and running I'll contribute it to James... but time is precious at the moment so it won't be soon. The Bayesian implementation used in the Thunderbird client is excellent [1] and the JunQuilla Thunderbird extension [2] by rkent is really good for managing the corpus and showing which keywords contributed to the 'spaminess' of the email. I know this doesn't really answer your question of what off-the-shelf software to use but I hope it reduces your suspicion about Bayesian filters. I've looked at lots ideas for rejecting spam; everything from greylisting [3], tarpitting [4], DNSBL, URIBL, VERP [5], SPF [6], Teergrubing [7] etc etc. However, they all have weaknesses and in the end I firmly believe that Bayesian analysis is the best way forward as it is the only method I've seen which adapts as the spam adapts. Regards, David Legg [1] http://mozilla.inkedblade.net/source/mozilla/mailnews/extensions/bayesian-spam-filter/src/ [2] http://mesquilla.com/extensions/junquilla/ [3] http://projects.puremagic.com/greylisting/whitepaper.html [4] http://www.spamcannibal.org/cannibal.cgi [5] http://cr.yp.to/proto/verp.txt [6] http://www.openspf.org/ [7] http://altlasten.lutz.donnerhacke.de/mitarb/lutz/usenet/teergrube.en.html On 15/10/13 12:54, Pete Williams wrote: > Hi > > I'm trying to find out what I need to do to implement an effective anti-spam > solution. > > We currently use James 2.3.2 but are working on an upgrade. > > I have found the Bayesian mailets, but would like to know if there is > anything else as a don't set a great deal of store by these methods. I'd > rather just try reverse mx look-ups, banning known black holes as a start. > > Any help appreciated. > > Cheers, > --------------------------------------------------------------------- To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org For additional commands, e-mail: server-user-h...@james.apache.org