I have a database that I've worked hard to set up, and while
many of my queries are working well, I have a few that are
proving to be extremely slow, despite my best efforts to
properly index them. I'd be grateful for any advice on how to
speed these up, or an explanation of how there's no way to
make it any better.

Basically this is a database of quotations, with tables "cg"
having 2.8M rows, "q" having 2.2M, "cit" 76,000, and "sref"
23,000. The larger tables are aspects of the quotations, the
smaller ones contain bibliographic info. Most queries are
getting quotations depending on some bibliographic factors.
All the relevant fields for joins are indexed.

The server is a 1.4Ghz PIII with 1G RAM and 15,000 RPM 
SCSI drives, and I'm running MySQL 4.0.9 on FreeBSD 4.7.
The server is very lightly loaded.

Two queries that are giving me trouble are this, which gives
a count of words added in a particular timespan ("sref.cd" is a
date field, indexed; "cg.cw" is an indexed VARCHAR):

SELECT count(cg.cw) AS count FROM cg,q,cit,sref
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND cg.cw LIKE 'm%'         
AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH);

This can take anywhere from 10-30 seconds to execute, depending
on the letter, and longer with a longer date range. The EXPLAIN
looks like this:

mysql> EXPLAIN SELECT count(cg.cw) AS count FROM cg,q,cit,sref
    -> WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
    -> AND cg.cw LIKE 'm%'
    -> AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)\G
*************************** 1. row ***************************
        table: cg
         type: range
possible_keys: q_id,cw
          key: cw
      key_len: 101
          ref: NULL
         rows: 153385
        Extra: Using where
*************************** 2. row ***************************
        table: q
         type: eq_ref
possible_keys: PRIMARY,cit_id
          key: PRIMARY
      key_len: 4
          ref: cg.q_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
        table: cit
         type: eq_ref
possible_keys: PRIMARY,sref_id
          key: PRIMARY
      key_len: 4
          ref: q.cit_id
         rows: 1
        Extra: 
*************************** 4. row ***************************
        table: sref
         type: eq_ref
possible_keys: PRIMARY,cd
          key: PRIMARY
      key_len: 4
          ref: cit.sref_id
         rows: 1
        Extra: Using where
4 rows in set (0.02 sec)

It is somewhat faster if I ask for the cg.cw itself, instead of the
count, but it's still in the many-seconds range, which is too slow.

An even worse query is this one, where I'm looking for the most
common words added in a particular timespan:

SELECT cg.cw, count(*) AS count FROM cg,cit,sref,q
WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
GROUP BY cw 
ORDER BY count DESC 
LIMIT 20;

This just took 5m 34s to execute, which is totally unbearable. I
understand that if it's trying to sort millions of rows it could
be difficult, but is there any way to speed this up? Here's the EXPLAIN:

mysql> EXPLAIN SELECT cg.cw, count(*) AS count FROM cg,cit,sref,q
    -> WHERE cg.q_id = q.id AND q.cit_id = cit.id AND cit.sref_id = sref.id
    -> AND sref.cd > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
    -> GROUP BY cw
    -> ORDER BY count DESC
    -> LIMIT 20\G
*************************** 1. row ***************************
        table: cg
         type: index
possible_keys: q_id
          key: cw
      key_len: 101
          ref: NULL
         rows: 2839036
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
        table: q
         type: eq_ref
possible_keys: PRIMARY,cit_id
          key: PRIMARY
      key_len: 4
          ref: cg.q_id
         rows: 1
        Extra: 
*************************** 3. row ***************************
        table: cit
         type: eq_ref
possible_keys: PRIMARY,sref_id
          key: PRIMARY
      key_len: 4
          ref: q.cit_id
         rows: 1
        Extra: 
*************************** 4. row ***************************
        table: sref
         type: eq_ref
possible_keys: PRIMARY,cd
          key: PRIMARY
      key_len: 4
          ref: cit.sref_id
         rows: 1
        Extra: Using where
4 rows in set (0.00 sec)

Thanks for any ideas. I'll probably have further problems when
I start trying to use FULLTEXT searches, but these are the
difficulties that sprung up right away.

Jesse Sheidlower

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to