Hello, I'm designing a tool to archive mailing list and I'm using a built-in search engine with a database structure similar to the structure use by udmsearch (crc32 multi). For a query where you want to match all keywords, I think that a single query joining all tables is faster than several queries using a temporary database. For udmsearch, this would mean replacing these queries : CREATE TABLE t9690554262079472 ( url_id INT DEFAULT '0' NOT NULL, word_id INT DEFAULT '0' NOT NULL, intag TINYINT DEFAULT '0' NOT NULL, KEY i1t9690554262079472(url_id), KEY i2t9690554262079472(word_id)) ---------------------------------------------------------------------------- ---- INSERT INTO t9690554262079472 SELECT url_id,word_id,intag FROM ndict6 WHERE word_id = 1739405965 ---------------------------------------------------------------------------- ---- INSERT INTO t9690554262079472 SELECT url_id,word_id,intag FROM ndict6 WHERE word_id = -1532716185 ---------------------------------------------------------------------------- ---- INSERT INTO t9690554262079472 SELECT url_id,word_id,intag FROM ndict10 WHERE word_id = 1201902955 ---------------------------------------------------------------------------- ---- SELECT SQL_SMALL_RESULT url_id, sum(intag) as r FROM t9690554262079472 GROUP BY url_id HAVING (( ( sum(word_id=1739405965)>0 ) AND sum(word_id=-1532716185)>0 ) AND ( sum(word_id=1201902955)>0 ) )) ORDER BY r DESC ---------------------------------------------------------------------------- ---- by this one : SELECT SQL_SMALL_RESULT d1.url_id, d1.intag+d2.intag+d3.intag as r FROM ndict6 d1,ndict6 d2,ndict10 d3 where d1.word_id = -1532716185 and d2.word_id=-1532716185 and d2.url_id=d1.url_id and d3.word_id=1201902955 and d3.url_id=d1.url_id ORDER BY r DESC Theorically, the idea it that it is not optimal to search all words independently, and then "join" the results. here is the optimizer output : +-------+------+--------------------+-----------+---------+-----------+----- -+-------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------+------+--------------------+-----------+---------+-----------+----- -+-------+ | d3 | ref | url_id10,word_id10 | word_id10 | 4 | ??? | 100 | | | d1 | ref | url_id6,word_id6 | url_id6 | 4 | d3.url_id | 10 | | | d2 | ref | url_id6,word_id6 | url_id6 | 4 | d1.url_id | 10 | | +-------+------+--------------------+-----------+---------+-----------+----- -+-------+ It search the urls matching the third keyword first, then for each url found checks if it matches the first keyword, then the same goes for the second keyword. Someone proposed recently to use subqueries for such a task, the query I propose eliminates this problem. I don't like subqueries and they aren't compatible with all dbms, and I prefer to let the system decide about which strategy to use. If your dbms optimizer is smart enough, it can decide like in the example above that it's better to start with the third keyword which is the most restrictive, then check for each url if it matches the second keyword (as key on line 2 is url_il6 and not word_id6, it would have searched the documents matching the first keyword and joined both if it was better) I don't have a big enough database for significative tests. If you have a big database with a slow query, please try to use the new version, or give me the debug output and I'll send you back a modified query to test, I'd like to see if it's really faster. Or provide me a shell access to your database where I can test it myself (unix only, I only know about mysql,postgresql, sybase !) I'm interest to know for various dbms if the optimizer is indeed clever enough, for example to start with the most significative keyword. If not, you could force the order of the query using stats for keywords (word_id,number of urls) Alain ______________ If you want to unsubscribe send "unsubscribe udmsearch" to [EMAIL PROTECTED]