On 24/05/2009 4:30 AM, Lukas Haase wrote:
> Hi,
> 
> Sorry for the subject - I just do not know for what to search of what to 
> ask - I actually do not know where is exactly my problem :-(
> 
> I have a database containing thousands of HTML pages ("topics"). There 
> is a fulltext index for these topics. First there is a table containing 
> all single words. Each word is identified by its "fulltextID":
> 
> CREATE TABLE fulltext(
>       fulltextID INTEGER PRIMARY KEY,
>       word VARCHAR(100) COLLATE NOCASE
> );
> 
> Now there is a linking table between the words and the HTML pages (topics):
> 
> CREATE TABLE topic_fulltext(
>       topicID INTEGER,
>       fulltextID INTEGER,
>       PRIMARY KEY(topicID, fulltextID)
> );

Call me crazy, but shouldn't the two components of the PK be reversed?

> Finding a topic containing a specific word is not too hard:
> 

Just put EXPLAIN QUERY PLAN in here ...

> SELECT topic_fulltext.topicID
> FROM fulltext
> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
> WHERE word LIKE 'Word%';

and you will see:
0|0|TABLE fulltext
1|1|TABLE topic_fulltext

but if you change the order of the columns in the PK you will see:
0|0|TABLE fulltext
1|1|TABLE topic_fulltext WITH INDEX sqlite_autoindex_topic_fulltext_1

which I'd interpret as a significant improvement.

Second suggestion: Although you may be a careful programmer and quite 
sure that you will never have duplicate or null words in your fulltext 
table, let's just go into nanny mode and tack UNIQUE NOT NULL onto the 
definition of the "word" column ... with this result:

0|0|TABLE fulltext WITH INDEX sqlite_autoindex_fulltext_1
1|1|TABLE topic_fulltext WITH INDEX sqlite_autoindex_topic_fulltext_1

Crazy ** 2 :-)

Third suggestion: read up carefully what has been written about when 
LIKE will use an index.

> 
> But now I want to be able to search with more complex queries. For example:
> 
> * List all topics containing (word1 AND word2)
> * List all topics containing (word1 OR word2)
> * List all topics containing (word1 AND word2 AND ... AND word10)
> * List all topics containing ((word1 OR word2) AND word3 OR word3)
> * ...
> 
> But now I have no clue how to accomplish this query. I just can't extend 
> my single-word query from above :-(
> 
> Can anyone give me a hint?

This problem has been addressed more than once over the last 30 or so 
years ... I'd be very surprised to find that any serious implementation 
used SQL queries for 100% of the work.

HTH,
John
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to