Hi all
 
I have this query:
SELECT n.ID, n.CatalogNumber, [...more...], d.ID, d.CatalogNumber, [...more...]
FROM newdarwincoredata n
INNER JOIN darwincoredata d ON n.CatalogNumber = d.CatalogNumber
ORDER BY n.CatalogNumber;
 
Both tables have exactly the same structure and indices:
mysql> SHOW CREATE TABLE darwincoredata;
CREATE TABLE `darwincoredata` (
  `ID` int(10) NOT NULL auto_increment,
  `CatalogNumber` varchar(20) NOT NULL,
  [...more...],
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `CatalogNumber` (`CatalogNumber`),
  UNIQUE KEY `GlobalUniqueIdentifier` (`GlobalUniqueIdentifier`),
  KEY `DateLastModified` (`DateLastModified`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

There are 336983 rows in darwincoredata, and 337304 in newdarwincoredata.

The plan for the query varies between using the CatalogNumber index and using a 
temporary table (which fails, once it fills up the disk). I get either this:
mysql> EXPLAIN <that query>
+----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key           | key_len | 
ref                          | rows   | Extra                           |
+----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+
|  1 | SIMPLE      | d     | ALL  | CatalogNumber | NULL          | NULL    | 
NULL                         | 336238 | Using temporary; Using filesort |
|  1 | SIMPLE      | n     | ref  | CatalogNumber | CatalogNumber | 22      | 
GBIF_wrapper.d.CatalogNumber |      1 |                                 |
+----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+
Or this:
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
| id | select_type | table | type  | possible_keys | key           | key_len | 
ref                          | rows   | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
|  1 | SIMPLE      | d     | index | CatalogNumber | CatalogNumber | 22      | 
NULL                         | 326508 |       |
|  1 | SIMPLE      | n     | ref   | CatalogNumber | CatalogNumber | 22      | 
GBIF_wrapper.d.CatalogNumber |      1 |       |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
Executing
    SHOW INDEX IN newdarwincoredata; SHOW INDEX IN darwincoredata;
And then doing the EXPLAIN again sometimes changes the plan. I assume this is 
because the cardinalities are recalculated.

I expect the CatalogNumber unique keys to always be used for this query. Even 
if I do:
SELECT [...]
FROM newdarwincoredata n FORCE INDEX (CatalogNumber)
INNER JOIN darwincoredata d FORCE INDEX (CatalogNumber) ON n.CatalogNumber = 
d.CatalogNumber
ORDER BY n.CatalogNumber
The query still sometimes uses a temporary table.

EXPLAIN SELECT STRAIGHT_JOIN [...] FROM newdarwincoredata n INNER JOIN 
darwincoredata d ON n.CatalogNumber = d.CatalogNumber ORDER BY n.CatalogNumber
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
| id | select_type | table | type  | possible_keys | key           | key_len | 
ref                          | rows   | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
|  1 | SIMPLE      | n     | index | CatalogNumber | CatalogNumber | 22      | 
NULL                         | 336588 |       |
|  1 | SIMPLE      | d     | ref   | CatalogNumber | CatalogNumber | 22      | 
GBIF_wrapper.n.CatalogNumber |      1 |       |
+----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+
This seems to always use the index, but I don't want to rely on this without 
knowing why -- might it be because n has more rows than d?

MySQL version is 5.0.22-log, x86_64, redhat-linux-gnu.

Thanks for any insights

Matt Blissett

Reply via email to