-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey Peter,

I have been working on this kind of setup last week.

On Nov 25, 2008, at 4:04 PM, Peter Fastré wrote:

Hello guys,

I'm running a small-sized hosting provider and currently our setup is following:

1 mailbox server running exim (only local delivery)
1 antispam server running exim + mailscanner + spamassassin + mailwatch -> sends all approved mail to mailbox server 2 mysql servers (master-slave) running databases for mailwatch + spamassassin (bayes)

I have two questions about this, hope someone can help me.

1. Because all the load goes to the smtp server now and to add some redundancy to our setup, we would like to add another antispam server, with the same setup (which is working fine). Will this be possible (concerning spamassassin), with two nodes sharing the same bayes database on the mysql servers? Is it possible to have two nodes feeding the same bayes database?

I'm not 100% sure, but since MySQL is ACID compliant, it should be very possible to autolearn from multiple locations to one central database. This is the setup I have made too. If you have multiple database servers and both are used by scan hosts, make sure one of them replicates the bayes stuff from the other, which is fed by sa- learn. Afaik, when using a SQL database for bayes, no important bayes stuff is stored on the host, so there's nothing that can get out of sync.



2. On my mailbox server I'd like to have a script which goes into the mailfolders, searches for a folder with the name 'Spam', feeds the message to sa-learn (which should be feeding it to the same bayes database of course), and then delete the message. Do you think this is a well-thought approach of having my users train the spam filters this way? Maybe there are already such scripts available?

With the database already setup, I have made a IMAP box with some dirs (Ham, Spam, Archive/Ham, Archive/Spam). The people I work with can configure this account and drop mail in the right folders.

I have a PHP script that teaches SA by feeding it the Ham and Spam dir contents. It then archives the mail, if told to do so. It might have been a basic shell script too, since it is just calls to sa-learn. The results are automaticly shared between scan hosts.

We do not use per user learning (yet), but all that would be needed is some iteration that wraps and repeats the above for all users, using the -u <user> option.

If you make sure the teaching is done per user, it is sufficiently 'thought-through' imho. If the input of messages is not correct, only that users' database will be soiled.

Samy



Thanks in advance,

Peter







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkksMCIACgkQKIdvzp2UK/F45wCeK/5xQ2fmZf77DbwX4wMDrsYR
6bAAn2yQI7h8HC1biJPuZeRCYKufIoAP
=cTQw
-----END PGP SIGNATURE-----

Reply via email to