* Alex Woick <[EMAIL PROTECTED]> [070927 02:14]:
> Micah Anderson schrieb am 27.09.2007 02:20:
>
>> processing has ground down to really slow. I'm seeing some incredibly
>> long queries now in my slow-query log, such as:
>
> Try an "optimize table <tabname>" for each of the sa tables. You just 
> filled the database from scratch, so perhaps the counters/statistics do not 
> reflect the actual value distribution yet.

Actually this bayes DB has been around for a few months, and has been
built up over time. 

This does make me wonder what regular DB maintenance tasks should be
performed on the bayes DB. It sounds like some people let the code
auto-expire, while some run cron jobs to expire data? What are the
benefits of each? Should I be running an optimize table every so often?

>> # Time: 070926 17:10:53
>> # [EMAIL PROTECTED]: spamass[spamass] @  [10.0.2.4]
>> # Query_time: 758  Lock_time: 0  Rows_sent: 1  Rows_examined: 2205327
>> SELECT count(*)
>>                FROM bayes_token
>>               WHERE id = '4'
>>                 AND ('1190846660' - atime) > '345600';
>
> More than 10 minutes for counting 2 mio rows is a bit long. You can try to 
> look what Mysql is doing all the time. Execute a "show full processlist" 
> from a mysql command line while the above query is running and look at the 
> "State" column. If a SA-initiated query is waiting for a lock and actually 
> doing nothing, you should see it there. You also see all the other queries 
> that are currently running at this point and may be hogging the database 
> server.

Since I've adjusted the SQL query to use the index, I haven't seen this
problem, so I can't look at the State column to see what is going on.
This DB server isn't doing anything else, for any other database, so
there was no possibility of other things hogging the resources on the
server. 

> The database design and query design of Spamassassin is ok, even the 
> appearently non-indexable term "('1190846660' - atime) > '345600'", since 
> Mysql would not use the index on an optimized term anyway. Try an EXPLAIN 
> of this statement - Mysql will always use only the first half for lookup (4 
> bytes) of the index, which covers only the id part.

That is if I am optimizing...

mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND
('1190846660' - atime) > '345600';
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
| id | select_type | table       | type | possible_keys            | key
| key_len | ref   | rows   | Extra                    |
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | bayes_token | ref  | PRIMARY,bayes_token_idx2 |
bayes_token_idx2 | 2       | const | 229946 | Using where; Using index | 
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+

>> innodb_flush_log_at_trx_commit=1
>
> Use value 0 for more performance and a small sacrifice of safety. See the 
> comment in the default *.ini file:

Mine doesn't have a comment... but looking at
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html does lead me
to want to change this since I dont care about transaction-level ACID 
compliance with the bayes database, if I have issues with that DB, I
can always restore the backup from the day before.

Micah

Reply via email to