hi,

i have a setup in which i have 2 MX hosts for incoming
internet email, each one with it's own Spamassassin (+amavisd)

over time, those 2 systems have independently learnt
(Bayes autolearn) what is considered spam/not-spam.

Of course, now, those systems sometimes have quite
a different bayesian opinion for the same mail. For
example, some nigerian scam scores a bayes_90 on
one host, while on the other it scores a bayes_50.

i know the question has already been asked before,
but it doesn't look as if there has been a real solution
proposed.

is it possible somehow (has anyone already done it)
to "merge" the 2 databases into 1 "master reference",
which could then be  recopied over the 2 independent
databases, thus reverting to a common "opinion" on
what is bayesian spam and what is not.

alternatively, as SA is learning to a journal, i think
it could be possible to do the following procedure
(your opinion are welcome) :

a) just before SA syncs the journal, it renames it
to bayes_journal.old (this is already done, it seems)

b) make a copy of this old journal file to a spool
directory (and rename the file to a unique name, so
it won't clash with other old journals), before SA
sync it to the db.

c) have a cron job which regularly copies these spooled
journals to the second node

d) on the second node, either have some tool, running via
cron, which can sync these journal files to the local db on it's
own (does such a tool already exist, or do you have any hints
howto write such a tool ?), or else patch SA code to be able
to sync multiple journals (instead of only "bayes_journal.old").

what do you think ?

maybe this point of "common" bayes learning will become
"unimportant" (i don't remember the right word now) in
SA 3.x (or was it 2.70) when the Bayes DB can be stored
in a real SQL database, multiple hosts should the be able to
write to the same database ?

Reply via email to