Henrik K wrote:
On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
Hi All,

I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes, and all three of them are sharing the same bayes DB using a MySQL cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes with a quadcore and 4 GB of memory. Everything is working fine, even the AWL in SQL, except for Bayes. The bayes database currently houses a bit less than 500k tokens and the database size is not very big either, as the datanodes have less than 1 GB of storage in use. I've followed the instructions from the Spamassassin wiki, and I also used the supplied bayes_mysql.sql file to create my tables. In case anyone is interested, you can find the cluster.ini and the my.cnf used on the SQL nodes here:

http://www.wcborstel.com/web/mysql/my.cnf

skip-innodb

That's pretty much the reason. You _need_ to use InnoDB as it has row level
locking. MyISAM just kills Bayes.
Actually I'm using NDB and not MyISAM. I need a clustered storage engine, otherwise the bayes DB can't really be shared. If I create an InnoDB table on one SQL node, it doesn't show up at the other SQL node, while this is the case with an NDB storage engine.

What I can do however, is point all mailservers to one SQL node. I just need to synchronize the bayes_token table to the other SQL node I guess. Do you have an idea about this?
Now the problem at the first glance seems to be, from my perspective (please correct me if I'm wrong), the actual queries being done. For every mail being scanned by spamassassin, it seems to be doing the "SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token" query every time. This effectively requesting the entire bayes_token table

What you are seeing are expiry runs.

As you right now use MyISAM, the whole table is locked for such operations
so you are pretty much hosed.

In any case, you should use "bayes_auto_expire 0" and run expire for example
once every night when traffic is slower.
Thanks for this, I was not aware of it. Running expiry runs manually is done by sa-learn --force-expiry, correct?
It seems that the query cache is either not suitable for this or I am
doing something majorly wrong :)

You are right. Better to disable completely if there's nothing else running
that uses it and save little CPU.
Good to know. There will be other applications running on it as well so I'll reduce the size of the query cache for a good bit.

Thanks a lot for your feedback.

Jorn



__________ Information from ESET NOD32 Antivirus, version of virus signature 
database 4336 (20090814) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Reply via email to