Hi Pete,

I've had James 2.3.1 running since forever on a small internet facing
server.

I installed the Bayesian filter back when even one spam during the day
was an event in itself!  Now it protects us from hundreds a day!  In
fact looking at the logs it rejected 704 so far today.  So I just wanted
to say that it works very well.

The Bayesian filter as supplied in James is very old and the theory
behind them has improved a lot.  The one in James doesn't attempt to
decode the email.  This means it analyzes base64 encoded messages
without decoding the base64 text.  Likewise it makes no attempt to
ignore images or attachments and therefore it fills its corpus with a
lot of random looking junk.  However despite this it still manages to
detect spam reasonably well provided you keep it trained and that means
sending it ham as well as spam.

With hindsight I  shouldn't have set it up to delete emails that are
thought likely to be spam.  It keeps the database small sure... but it
is a pain to restore an email that was wrongly deleted.  In fact to
restore a deleted email I have to go into MySQL's binary logs, search
for that part of the log that inserted the email into the spool and then
save it into a stand-alone file that can be read by an email reader. 
That's not something I like to do too often.

In a future setup I think I'll move over to James' IMAP server and
simply move suspected emails into a 'bad' list that the user can trawl
through when they think something has been mis-classified.

At present I've been experimenting with N-gram based bayesian filters as
I think they hold a lot of promise.  If I get something up and running
I'll contribute it to James... but time is precious at the moment so it
won't be soon.

The Bayesian implementation used in the Thunderbird client is excellent
[1] and the JunQuilla Thunderbird extension [2] by rkent is really good
for managing the corpus and showing which keywords contributed to the
'spaminess' of the email.  I know this doesn't really answer your
question of what off-the-shelf software to use but I hope it reduces
your suspicion about Bayesian filters.

I've looked at lots ideas for rejecting spam; everything from
greylisting [3], tarpitting [4], DNSBL, URIBL, VERP [5], SPF [6],
Teergrubing [7] etc etc.

However, they all have weaknesses and in the end I firmly believe that
Bayesian analysis is the best way forward as it is the only method I've
seen which adapts as the spam adapts.

Regards,
David Legg


[1]
http://mozilla.inkedblade.net/source/mozilla/mailnews/extensions/bayesian-spam-filter/src/
[2] http://mesquilla.com/extensions/junquilla/
[3] http://projects.puremagic.com/greylisting/whitepaper.html
[4] http://www.spamcannibal.org/cannibal.cgi
[5] http://cr.yp.to/proto/verp.txt
[6] http://www.openspf.org/
[7]
http://altlasten.lutz.donnerhacke.de/mitarb/lutz/usenet/teergrube.en.html

On 15/10/13 12:54, Pete Williams wrote:
> Hi
>  
> I'm trying to find out what I need to do to implement an effective anti-spam 
> solution.
>  
> We currently use James 2.3.2 but are working on an upgrade.
>  
> I have found the Bayesian mailets, but would like to know if there is 
> anything else as a don't set a great deal of store by these methods. I'd 
> rather just try reverse mx look-ups, banning known black holes as a start.
>  
> Any help appreciated.
>  
> Cheers,
>                                         


---------------------------------------------------------------------
To unsubscribe, e-mail: server-user-unsubscr...@james.apache.org
For additional commands, e-mail: server-user-h...@james.apache.org

Reply via email to