Igor Tandetnik schrieb:
> "Lukas Haase" <lukasha...@gmx.at> wrote in
> message news:gv9fcm$5r...@ger.gmane.org
>> I have a database containing thousands of HTML pages ("topics"). There
>> is a fulltext index for these topics. First there is a table
>> containing all single words. Each word is identified by its
>> "fulltextID":
>>
>> CREATE TABLE fulltext(
>> fulltextID INTEGER PRIMARY KEY,
>> word VARCHAR(100) COLLATE NOCASE
>> );
>>
>> Now there is a linking table between the words and the HTML pages
>> (topics):
>>
>> CREATE TABLE topic_fulltext(
>> topicID INTEGER,
>> fulltextID INTEGER,
>> PRIMARY KEY(topicID, fulltextID)
>> );
>>
>> Finding a topic containing a specific word is not too hard:
>>
>> SELECT topic_fulltext.topicID
>> FROM fulltext
>> JOIN topic_fulltext ON topic_fulltext.fulltextID = fulltext.fulltextID
>> WHERE word LIKE 'Word%';
>>
>> But now I want to be able to search with more complex queries. For
>> example:
>>
>> * List all topics containing (word1 AND word2)
> 
> You could do something like this:

Oh, thank you, this seems more like what I am looking for :-) :-)

> SELECT topic_fulltext.topicID FROM topic_fulltext
> where exists (select 1 from fulltext
>     WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word LIKE 
> 'word1%')
> and exists (select 1 from fulltext
>     WHERE topic_fulltext.fulltextID = fulltext.fulltextID and word LIKE 
> 'word2%')

Unfortunately this does not work completely :(

In the first WHERE clause I restrict to entried containing only the 
"word1" (the resultset will contain only entries with "word1"). So the 
second WHERE clause will always fail as there are no rows with "word2" left.

This seems to work only for my OR-requirement (topics containg either 
word1 OR word2) when replacing AND with OR.

> SELECT topic_fulltext.topicID FROM topic_fulltext
> where fulltextID in (
>     select fulltextID from topic_fulltext where word LIKE 'word1%'
>     intersect
>     select fulltextID from topic_fulltext where word LIKE 'word2%');

Unfortunately this does not work either.

The set is taken from the fullwords, i.e. the result of the inner SELECT 
clause will contain fulltextIDs. And they will obviosly never intersect.

So I have the same problem as above: Replacing with UNION ALL yields my 
OR-requirement but I can't get working it with AND :-(

> Test it, see which one works faster.

The second one seems to be much faster. Though it's too slow (3s or so), 
but I hope I can tune up the query on the end...

> [...]
>> * List all topics containing (word1 AND word2 AND ... AND word10)
>> * List all topics containing ((word1 OR word2) AND word3 OR word3)
> 
> The approach above should work for any boolean combination.

I have forgotten one third type: the NOT. E.g.:

* List all topics containing (NOT(word1 OR word2) AND word3)

But this is no hard requirement, but boolean AND and OR combinations are.

Thank you for your approaches, I tried to get the AND and OR working 
with it but I still do not figure it out :-(

By the way: If there is a better way to organize the index in the 
database: This would be no problem if the queries will get simpler and 
faster. (As long as the memeory requirement stays approx. the same)

Thank you again Igor,
Luke

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to