I have built a custom inverted text index to search my data. I'm not using MySQL's fulltext index because A) I want more control over the index and results, and B) I want to learn how this all works, and if it's possible to do this quickly in MySQL.
I'm trying to formulate fulltext searches using operators like AND and OR, and also phrases in quotes where the order of the words is important. I have three tables: Table A (Content): content_id, UNSIGNED INT AUTO_INCREMENT PRIMARY content_text, TEXT Table B (Words): word_id, UNSIGNED INT AUTO_INCREMENT PRIMARY word_text, VARCHAR(30) Table C (Word Locations): wl_id, UNSIGNED INT AUTO_INCREMENT PRIMARY wl_wordid, UNSIGNED INT referring to B.word_id wl_locationid, UNSIGNED INT pointing to A.content_id wl_previd, UNSIGNED INT pointing to the wl_id of the previous word (NULL otherwise) I have indices on each column in table C as well as a UNIQUE on the (wl_wordid, wl_locationid, wl_previd). So far so good. Now I'm trying to intelligently build queries to search this data. For example, if I wanted to search for the exact phrase "mysql query" and assuming the word_id for mysql is 100 and query is 101: SELECT A.content_id, COUNT(*) AS relevance FROM A, C AS C0, C AS C1 WHERE A.content_id=C0.wl_locationid AND C0.wl_wordid=100 AND C1.wl_wordid=101 AND C1.wl_previd=C0.wl_id ORDER BY A.content_id GROUP BY A.content_id ORDER BY relevance DESC; That seems to work well enough and quickly. My simple relevance formula is counting the number of times that phrase occurs in the specific content. I want to add complexity to my queries, and I'm not sure if I'm forumulating them properly. If I want to search for ("mysql query" AND php), here's how I'm currently writing that (assume php's word_id is 102): SELECT A.content_id, COUNT(*) AS relevance FROM A, C AS C0, C AS C1, C AS C2 WHERE A.content_id=C0.wl_locationid AND C0.wl_wordid=100 AND C1.wl_wordid=101 AND C1.wl_previd=C0.wl_id AND content_id=C2.wl_locationid AND C2.wl_wordid=102 ORDER BY A.content_id GROUP BY A.content_id ORDER BY relevance DESC; This query works OK, but it's already starting to get slow. Also, the "relevance" number returns the product of the rows that match "php" and those that match "mysql query", which means the query is generating a lot of rows. Adding another search term to the mix will complicate that further. "OR" searches will pull a lot of rows too. Is there a better way to build such queries? Should I be using a different join? I did try to do a UNION of the various search components (one for "mysql query" and the next for php), but there doesn't seem to be a convenient way to pull out only the rows that match both criteria. I imagine stuffing the results of the UNION into a temp table and working with that would be slow. I admit I don't know a lot about the right way to do this. It's difficult to search the web on this topic, perhaps because I don't know the correct terminology. I usually end up with results regarding the built-in fulltext features of various databases. Any web links are appreciated. Surely there's a way to do this efficiently using SQL queries. Thanks! Mike Boone http://boonedocks.net/mike -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]