Ben Poliakoff wrote:

Hi Ben

What sort of experiences have people had managing a sitewide bayes db
that is used by spamassassin (spamd|amavisd) instances on multiple
machines?  I've got an environment with spamassassin/amavisd-new running
in parallel on a pool of two (but possibly more in the future) equally
weighted machines.  How have you avoided the dreaded Single Point of
Failure?

Running here two servers with SA in load balancing. Each machine has its own local Bayes&AWL DB (no SPoF). Given the amount of incoming traffic (100kmsgs/server/workday) we are statistically sure that both servers see the same (spam) messages.


We have not noticed any efficiency unbalance between the two instances in over 12 months.

Having two DBs has also one advantage: if Bayes on one machine gets corrupted (wrong training, ...) you can restore it from the twin server with a simple FTP. We have done this at least once.

What needs to be done periodically is AWL DB purging/reset since it keeps growing and growing...

We were considering a MySQL DB on a third machine (with failover on other two), but the loss of Bayes history is not such a big issue IMHO. A nighttime backup is probably enough as long as you have another machine to restore the DB few hours after failure. Nevertheless a good ham/spam collection will re-train your Bayesian filter in a matter of minutes.

Our third machine will probably run a local mirror of SURBL, instead!

HTH,
Paolo

Reply via email to