On 26 Jun 2009, at 12:25pm, Alberto Simões wrote:

> one adition, one remotion or one substitution

I am always amazed at how well people use English.  For your word  
'remotion' you probably mean 'removal' or 'omission'.  You have joined  
the two possibilities together !

> Then, the script constructs an SQL query:
>
> SELECT DISTINCT(word) FROM dict WHERE word = "ar" OR word = "ca" OR
> word LIKE "_car" OR word LIKE "c_r" OR word = "cr" OR word LIKE "_ar"
> OR word LIKE "ca_r" OR word LIKE "c_ar" OR word LIKE "ca_" OR word
> LIKE "car_";
>
> And this SQL quer works... but not as quickly as I need (specially
> because the speed is proportional to the word size).

You could write a program to prepare another table in the same  
database with your near-misses in.  In other words, to take each word  
in the dictionary (like 'car') and put entries in this other table for  
each near miss you wish to accept:

nearMiss        realWord
--------        --------
car             car
ca              car
cr              car
ar              car
ca_             car
c_r             car
_ar             car
_car            car
c_ar            car
ca_r            car
car_            car
cat             cat
ca              cat
ct              cat
at              cat
ca_             cat
c_t             cat
_at             cat
_cat            cat
c_at            cat
ca_t            cat
cat_            cat

Then, in your search phase you just consult the near-miss table

SELECT realWord FROM nearMisses WHERE [whatever] LIKE  
nearMisses.nearMiss;

and find all the applicable entries: a single lookup against one index  
should be extremely fast.  Look up the word 'ca' and you get the both  
'car' and 'cat' realWords.  You could even include a JOIN to find the  
entries in your dict table too.

It should be easy to write software which goes through every  
permutation of missing letter, extra letter, etc..  It will lead to  
one very big table, but it will give you instant lookup.  You can  
shrink the table by using the LIKE operator both ways around, at the  
penalty of doubling the time taken.  The choice of whether to bother  
consulting the nearMiss table if the user typed a real word to start  
with is up to you.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to