I hadn't thought about it until now, but there's no reason you can't 
treat the table of substrings as substrings appearing anywhere in a 
word, and join each substring of a word to the substrings table.

For 26 letters, you have > 17k three-letter substrings and almost 500k 
four-letter substrings, so this technique probably is limited to 
three-letter substrings. But that's not bad--it means that if 
character sequences are evenly distributed in words (they're not, but 
assume they are for a moment), then the set of all possible words is 
cut into 1/>17,000 on the first search. There are only about 600,000 
words in the English language, so with even distribution your first 
search cuts the results to ~34 words. But uneven character-sequence 
distributions will skew that--some sequences will result in more words 
than others. Sill, even if you get back 500 words to search for the 
full match, that's better than searching all of your words. (You also 
have to store all one-letter and two-letter substrings, but those 
don't take up much additional room.)

You would have to have a separate table that has the multiple joins 
(one per substring) in your words, because now each word can 
potentially have M substrings in it, so M foreign keys.

For your example, Motor/motor would match mot, oto, and tor. You would 
use a rule to arbitrarily match the first N letters in a the search 
term, so if someone searched for "%motor%", that would be a search in 
the substrings table for "mot". This would return the primary key for 
"mot". Then you search the substring-matches table for all words that 
have that substring, and this gives you back a list of words 
containing "mot". Then you search that result set for "%motor%". 
Similarly, if someone searchs for "%oto%", you get the substring key 
for "oto" and find words containing that substring.

For all I know, full-text-search engines may do something like this.

----- Original Message ----- 
From: "Lukas Haase" <lukasha...@gmx.at>
To: <sqlite-users@sqlite.org>
Sent: Thursday, August 06, 2009 6:54 AM
Subject: Re: [sqlite] FTS and postfix search


> Hi Jim,
>
> and thank you for the great idea. But I thought it would be possible 
> to
> search '*word*' - but this is not possible with this method either.
>
> Is there any chance for searching '*word*' quickly?
>
> Regards,
> Luke
>
> Jim Showalter schrieb:
>> You could store the words reversed (in addition to storing them in
>> forward order). Then like 'xxx%' would be fast.
>>
>> This would double your disk footprint, but could give you the 
>> search
>> performance you're looking for.
>>
>> If that's too goofy, you could create a table of all one, two, and
>> three-character word endings, and join to it from all of your words
>> (stored in forward order). Then search first for the primary key of
>> the word ending you want to search for, then search your words for
>> that key.
>>
>> Index the join.
>>
>> ----- Original Message ----- 
>> From: "Lukas Haase" <lukasha...@gmx.at>
>> To: <sqlite-users@sqlite.org>
>> Sent: Wednesday, August 05, 2009 6:16 PM
>> Subject: Re: [sqlite] FTS and postfix search
>>
>>
>>> Wes Freeman schrieb:
>>>> I clearly am not in the right mindset to be answering list 
>>>> emails.
>>>> Please ignore my response (it's too late now)--back to my 
>>>> stressful
>>>> deadline.
>>> :-)
>>>
>>>> Strange that it's implemented for prefix and not postfix?
>>> Well, an explanation is easy: Same as with LIKE, LIKE 'xxx' or 
>>> LIKE
>>> 'xxx%' can be performed easy because only the beginning of words
>>> need to
>>> be compared.
>>>
>>> However, there /is/ a way to also do postfix searches. I have the
>>> *same*
>>> database in *.hlp format and with WinHelp it's possible to search
>>> '*otor' (and others) with almost zero CPU and time consumption. 
>>> I'd
>>> be
>>> curious how they did this.
>>>
>>> For a solution for SQLite I would accept a small performance 
>>> penalty
>>> in
>>> that case (but very few secs max); additionally I would also 
>>> accept
>>> the
>>> index being bigger.
>>>
>>> Regards,
>>> Luke
>>>
>>>> Wes
>>>>
>>>> On Wed, Aug 5, 2009 at 8:58 PM, Lukas Haase<lukasha...@gmx.at>
>>>> wrote:
>>>>> Wes Freeman schrieb:
>>>>>> Why not LIKE '%otor'?
>>>>> SELECT topic_title FROM topics
>>>>> WHERE topic LIKE '%otor%'
>>>>> ORDER BY topic_title ASC;
>>>>>
>>>>> This is very, very slow, especially on my > 100 MB database.
>>>>> "Realtime"
>>>>> search in the GUI is a requirement. This is exactly the reason 
>>>>> why
>>>>> I
>>>>> want to use FTS instead of LIKE...
>>>>>
>>>>> Regards,
>>>>> Luke
>>>>>
>>>>>> Wes
>>>>>>
>>>>>> On Wed, Aug 5, 2009 at 7:47 PM, Lukas Haase<lukasha...@gmx.at>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> It's me again, sorry. The next big problem concerning FTS. I
>>>>>>> have the
>>>>>>> requirement to do postfix searches, like:
>>>>>>>
>>>>>>> SELECT topic_title FROM topics
>>>>>>> WHERE topic MATCH '*otor'
>>>>>>> ORDER BY topic_title ASC;
>>>>>>>
>>>>>>> should find Motor, motor, Monotor etc. But this does not seem 
>>>>>>> to
>>>>>>> work.
>>>>>>> Is there any chance to get this working?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Luke
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sqlite-users mailing list
>>>>>>> sqlite-users@sqlite.org
>>>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>>>
>>>>>> _______________________________________________
>>>>>> sqlite-users mailing list
>>>>>> sqlite-users@sqlite.org
>>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>>
>>>>> _______________________________________________
>>>>> sqlite-users mailing list
>>>>> sqlite-users@sqlite.org
>>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>>
>>>> _______________________________________________
>>>> sqlite-users mailing list
>>>> sqlite-users@sqlite.org
>>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users 

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to