Re: Clustering spamassassin + autolearning

Samy Ascha, Xel Media B.V. Tue, 25 Nov 2008 09:05:08 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey Peter,


I have been working on this kind of setup last week.

On Nov 25, 2008, at 4:04 PM, Peter Fastré wrote:

Hello guys,
I'm running a small-sized hosting provider and currently our setupis following:
1 mailbox server running exim (only local delivery)
1 antispam server running exim + mailscanner + spamassassin +mailwatch -> sends all approved mail to mailbox server2 mysql servers (master-slave) running databases for mailwatch +spamassassin (bayes)
I have two questions about this, hope someone can help me.
1. Because all the load goes to the smtp server now and to add someredundancy to our setup, we would like to add another antispamserver, with the same setup (which is working fine). Will this bepossible (concerning spamassassin), with two nodes sharing the samebayes database on the mysql servers? Is it possible to have twonodes feeding the same bayes database?

I'm not 100% sure, but since MySQL is ACID compliant, it should bevery possible to autolearn from multiple locations to one centraldatabase. This is the setup I have made too. If you have multipledatabase servers and both are used by scan hosts, make sure one ofthem replicates the bayes stuff from the other, which is fed by sa-learn. Afaik, when using a SQL database for bayes, no important bayesstuff is stored on the host, so there's nothing that can get out ofsync.

2. On my mailbox server I'd like to have a script which goes intothe mailfolders, searches for a folder with the name 'Spam', feedsthe message to sa-learn (which should be feeding it to the samebayes database of course), and then delete the message. Do you thinkthis is a well-thought approach of having my users train the spamfilters this way? Maybe there are already such scripts available?

With the database already setup, I have made a IMAP box with some dirs(Ham, Spam, Archive/Ham, Archive/Spam). The people I work with canconfigure this account and drop mail in the right folders.

I have a PHP script that teaches SA by feeding it the Ham and Spam dircontents. It then archives the mail, if told to do so. It might havebeen a basic shell script too, since it is just calls to sa-learn. Theresults are automaticly shared between scan hosts.

We do not use per user learning (yet), but all that would be needed issome iteration that wraps and repeats the above for all users, usingthe -u <user> option.

If you make sure the teaching is done per user, it is sufficiently'thought-through' imho. If the input of messages is not correct, onlythat users' database will be soiled.


Samy



Thanks in advance,

Peter



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkksMCIACgkQKIdvzp2UK/F45wCeK/5xQ2fmZf77DbwX4wMDrsYR
6bAAn2yQI7h8HC1biJPuZeRCYKufIoAP
=cTQw
-----END PGP SIGNATURE-----

Re: Clustering spamassassin + autolearning

Reply via email to