-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hey Peter,
I have been working on this kind of setup last week.
On Nov 25, 2008, at 4:04 PM, Peter Fastré wrote:
Hello guys,
I'm running a small-sized hosting provider and currently our setup
is following:
1 mailbox server running exim (only local delivery)
1 antispam server running exim + mailscanner + spamassassin +
mailwatch -> sends all approved mail to mailbox server
2 mysql servers (master-slave) running databases for mailwatch +
spamassassin (bayes)
I have two questions about this, hope someone can help me.
1. Because all the load goes to the smtp server now and to add some
redundancy to our setup, we would like to add another antispam
server, with the same setup (which is working fine). Will this be
possible (concerning spamassassin), with two nodes sharing the same
bayes database on the mysql servers? Is it possible to have two
nodes feeding the same bayes database?
I'm not 100% sure, but since MySQL is ACID compliant, it should be
very possible to autolearn from multiple locations to one central
database. This is the setup I have made too. If you have multiple
database servers and both are used by scan hosts, make sure one of
them replicates the bayes stuff from the other, which is fed by sa-
learn. Afaik, when using a SQL database for bayes, no important bayes
stuff is stored on the host, so there's nothing that can get out of
sync.
2. On my mailbox server I'd like to have a script which goes into
the mailfolders, searches for a folder with the name 'Spam', feeds
the message to sa-learn (which should be feeding it to the same
bayes database of course), and then delete the message. Do you think
this is a well-thought approach of having my users train the spam
filters this way? Maybe there are already such scripts available?
With the database already setup, I have made a IMAP box with some dirs
(Ham, Spam, Archive/Ham, Archive/Spam). The people I work with can
configure this account and drop mail in the right folders.
I have a PHP script that teaches SA by feeding it the Ham and Spam dir
contents. It then archives the mail, if told to do so. It might have
been a basic shell script too, since it is just calls to sa-learn. The
results are automaticly shared between scan hosts.
We do not use per user learning (yet), but all that would be needed is
some iteration that wraps and repeats the above for all users, using
the -u <user> option.
If you make sure the teaching is done per user, it is sufficiently
'thought-through' imho. If the input of messages is not correct, only
that users' database will be soiled.
Samy
Thanks in advance,
Peter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
iEYEARECAAYFAkksMCIACgkQKIdvzp2UK/F45wCeK/5xQ2fmZf77DbwX4wMDrsYR
6bAAn2yQI7h8HC1biJPuZeRCYKufIoAP
=cTQw
-----END PGP SIGNATURE-----