Henrik K wrote:
On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
Hi All,
I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes,
and all three of them are sharing the same bayes DB using a MySQL
cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes
with a quadcore and 4 GB of memory. Everything is working fine, even the
AWL in SQL, except for Bayes. The bayes database currently houses a bit
less than 500k tokens and the database size is not very big either, as
the datanodes have less than 1 GB of storage in use. I've followed the
instructions from the Spamassassin wiki, and I also used the supplied
bayes_mysql.sql file to create my tables. In case anyone is interested,
you can find the cluster.ini and the my.cnf used on the SQL nodes here:
http://www.wcborstel.com/web/mysql/my.cnf
skip-innodb
That's pretty much the reason. You _need_ to use InnoDB as it has row level
locking. MyISAM just kills Bayes.
Actually I'm using NDB and not MyISAM. I need a clustered storage
engine, otherwise the bayes DB can't really be shared. If I create an
InnoDB table on one SQL node, it doesn't show up at the other SQL node,
while this is the case with an NDB storage engine.
What I can do however, is point all mailservers to one SQL node. I just
need to synchronize the bayes_token table to the other SQL node I guess.
Do you have an idea about this?
Now the problem at the first glance seems to be, from my perspective
(please correct me if I'm wrong), the actual queries being done. For
every mail being scanned by spamassassin, it seems to be doing the
"SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM
bayes_token" query every time. This effectively requesting the entire
bayes_token table
What you are seeing are expiry runs.
As you right now use MyISAM, the whole table is locked for such operations
so you are pretty much hosed.
In any case, you should use "bayes_auto_expire 0" and run expire for example
once every night when traffic is slower.
Thanks for this, I was not aware of it. Running expiry runs manually is
done by sa-learn --force-expiry, correct?
It seems that the query cache is either not suitable for this or I am
doing something majorly wrong :)
You are right. Better to disable completely if there's nothing else running
that uses it and save little CPU.
Good to know. There will be other applications running on it as well so
I'll reduce the size of the query cache for a good bit.
Thanks a lot for your feedback.
Jorn
__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4336 (20090814) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com