I have built a custom inverted text index to search my data. I'm not using
MySQL's fulltext index because A) I want more control over the index and
results, and B) I want to learn how this all works, and if it's possible to
do this quickly in MySQL.

I'm trying to formulate fulltext searches using operators like AND and OR,
and also phrases in quotes where the order of the words is important.

I have three tables:

Table A (Content):
content_id, UNSIGNED INT AUTO_INCREMENT PRIMARY
content_text, TEXT

Table B (Words):
word_id, UNSIGNED INT AUTO_INCREMENT PRIMARY
word_text, VARCHAR(30)

Table C (Word Locations):
wl_id, UNSIGNED INT AUTO_INCREMENT PRIMARY
wl_wordid, UNSIGNED INT referring to B.word_id
wl_locationid, UNSIGNED INT pointing to A.content_id
wl_previd, UNSIGNED INT pointing to the wl_id of the previous word (NULL
otherwise)
I have indices on each column in table C as well as a UNIQUE on the
(wl_wordid, wl_locationid, wl_previd).

So far so good. Now I'm trying to intelligently build queries to search this
data.

For example, if I wanted to search for the exact phrase "mysql query" and
assuming the word_id for mysql is 100 and query is 101:

SELECT A.content_id, COUNT(*) AS relevance FROM A, C AS C0, C AS C1 WHERE
A.content_id=C0.wl_locationid AND C0.wl_wordid=100 AND C1.wl_wordid=101 AND
C1.wl_previd=C0.wl_id ORDER BY A.content_id GROUP BY A.content_id ORDER BY
relevance DESC;

That seems to work well enough and quickly. My simple relevance formula is
counting the number of times that phrase occurs in the specific content.

I want to add complexity to my queries, and I'm not sure if I'm forumulating
them properly. If I want to search for ("mysql query" AND php), here's how
I'm currently writing that (assume php's word_id is 102):

SELECT A.content_id, COUNT(*) AS relevance FROM A, C AS C0, C AS C1, C AS C2
WHERE A.content_id=C0.wl_locationid AND C0.wl_wordid=100 AND
C1.wl_wordid=101 AND C1.wl_previd=C0.wl_id AND content_id=C2.wl_locationid
AND C2.wl_wordid=102 ORDER BY A.content_id GROUP BY A.content_id ORDER BY
relevance DESC;

This query works OK, but it's already starting to get slow. Also, the
"relevance" number returns the product of the rows that match "php" and
those that match "mysql query", which means the query is generating a lot of
rows. Adding another search term to the mix will complicate that further.
"OR" searches will pull a lot of rows too.

Is there a better way to build such queries? Should I be using a different
join? I did try to do a UNION of the various search components (one for
"mysql query" and the next for php), but there doesn't seem to be a
convenient way to pull out only the rows that match both criteria. I imagine
stuffing the results of the UNION into a temp table and working with that
would be slow.

I admit I don't know a lot about the right way to do this. It's difficult to
search the web on this topic, perhaps because I don't know the correct
terminology. I usually end up with results regarding the built-in fulltext
features of various databases. Any web links are appreciated. Surely there's
a way to do this efficiently using SQL queries.

Thanks!
Mike Boone
http://boonedocks.net/mike


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to