processing has ground down to really slow. I'm seeing some incredibly
long queries now in my slow-query log, such as:

Try an "optimize table <tabname>" for each of the sa tables. You just filled the database from scratch, so perhaps the counters/statistics do not reflect the actual value distribution yet.

Actually this bayes DB has been around for a few months, and has been
built up over time.

I experienced a problem with a custom innodb-stored database, where at one point a certain query suddenly took minutes instead of microseconds. Using EXPLAIN, I saw that indexes were used not as intended any more. I changed the query to include a FORCE INDEX clause, but later detected that an OPTIMIZE TABLE remedied the situation also. I saw this on two different databases that were built up over time from totally empty to many million records. OPTIMIZE TABLE on innodb tables only defragments the index and updates the index statistics.

This does make me wonder what regular DB maintenance tasks should be
performed on the bayes DB.  It sounds like some people let the code
auto-expire, while some run cron jobs to expire data?

I added a TIMESTAMP column to the bayes_seen table, so I can expire this table by date. The other tables are maintained internally by SA, so don't do anything with them.

Should I be running an optimize table every so often?

No, I don't think so. In my above mentioned databases, the problem never came back, and I didn't run optimize table since then (more than 1 year ago).

mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND
('1190846660' - atime) > '345600';
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
| id | select_type | table       | type | possible_keys            | key
| key_len | ref   | rows   | Extra                    |
+----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | bayes_token | ref  | PRIMARY,bayes_token_idx2 |
bayes_token_idx2 | 2 | const | 229946 | Using where; Using index | +----+-------------+-------------+------+--------------------------+------------------+---------+-------+--------+--------------------------+

The "2" in the key_len column makes me wonder. That means only 2 bytes of the index are used. The index at my system includes id (INTEGER=4 bytes) and atime (INTEGER=4 bytes). 2 Bytes would be half of the id field, which is impossible. The key_len field always shows 4 at my system, so it uses the id part (the first half) of the index for lookup. Have you changed id to smallint? Then you have altered the table which has the same (side-)effect as optimize table. I don't think truncating the field to 2 bytes did the speedup, I think the side-effect of recreating the index did it.

Reply via email to