Joshua Cornejo wrote:

We're currently evaluating how to cluster spamassassin without having
to have different heuristic databases (shared knowledge). I guess the
[...]
it's not worth and is rather better to have independent machines with
different databases and different knowledge caused by the differences
in email processed by each server. Any views/urls ?

We chose not to depend on a shared filesystem for our hardware load-balanced instances of postfix+spamassassin in order to maintain high-availability of the inbound SMTP service.


If any of the instances crashes for whatever reason, others take over the whole traffic. If the disk runs out of space (because the AWL grows and grows and you gotta clean it up manually) only one instance gets corrupted AWL and Bayes DB files.

Shared knowledge is not a big issue since you can be statistically sure that on the long run all SA instances will see the same traffic.

OTOH training Bayes DB must be done on all instances, as well as searching through the MTA logs if some mail "disappears"...

It is possible to store Bayes on a MySQL database, but that would introduce another point of failure in our architecture, and we can't replicate that one too!

Hope my reply was not too far from what you expected,
Paolo



Reply via email to