On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
> Hi All,
>
> I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes,  
> and all three of them are sharing the same bayes DB using a MySQL  
> cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes  
> with a quadcore and 4 GB of memory. Everything is working fine, even the  
> AWL in SQL, except for Bayes. The bayes database currently houses a bit  
> less than 500k tokens and the database size is not very big either, as  
> the datanodes have less than 1 GB of storage in use. I've followed the  
> instructions from the Spamassassin wiki, and I also used the supplied  
> bayes_mysql.sql file to create my tables. In case anyone is interested,  
> you can find the cluster.ini and the my.cnf used on the SQL nodes here:
>
> http://www.wcborstel.com/web/mysql/my.cnf

skip-innodb

That's pretty much the reason. You _need_ to use InnoDB as it has row level
locking. MyISAM just kills Bayes.

> Now the problem at the first glance seems to be, from my perspective  
> (please correct me if I'm wrong), the actual queries being done. For  
> every mail being scanned by spamassassin, it seems to be doing the  
> "SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM  
> bayes_token" query every time. This effectively requesting the entire  
> bayes_token table

What you are seeing are expiry runs.

As you right now use MyISAM, the whole table is locked for such operations
so you are pretty much hosed.

In any case, you should use "bayes_auto_expire 0" and run expire for example
once every night when traffic is slower.

> It seems that the query cache is either not suitable for this or I am
> doing something majorly wrong :)

You are right. Better to disable completely if there's nothing else running
that uses it and save little CPU.

Reply via email to