Hello

2009/6/26 Alberto Simões <hashas...@gmail.com>:
> I am trying to find words in a dictionary stored in sqlite, and trying
> a near miss approach.
> For that I tried an algorithm to create patterns corresponding to
> Levenshtein distance of 1 (edit distance of 1).
> That means, one adition, one remotion or one substitution.
>
> For that, my script receives a word (say, 'car') and generated all
> possible additions and remotions, and substitutions:
>
> Additions: _car c_ar ca_r car_
> Substitutions: _ar c_r ca_
> remotions: ar cr ca
>
> Then, the script constructs an SQL query:
>
> SELECT DISTINCT(word) FROM dict WHERE word = "ar" OR word = "ca" OR
> word LIKE "_car" OR word LIKE "c_r" OR word = "cr" OR word LIKE "_ar"
> OR word LIKE "ca_r" OR word LIKE "c_ar" OR word LIKE "ca_" OR word
> LIKE "car_";
>
> And this SQL quer works... but not as quickly as I need (specially
> because the speed is proportional to the word size).

My current solution is to make all combinations of words: having a
list of letters, cycle them and substitute the underscore by that
letter.
The resulting list is being searched with   SELECT word FROM dict
WHERE word IN ('worda','wordb',wordc') and so on.
While for big words this list can be big (it gets above 1000 easily)
the query executes very fast.

Hope this might be helpful for somebody.
Cheers
Alberto
-- 
Alberto Simões
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to