Hello 2009/6/26 Alberto Simões <hashas...@gmail.com>: > I am trying to find words in a dictionary stored in sqlite, and trying > a near miss approach. > For that I tried an algorithm to create patterns corresponding to > Levenshtein distance of 1 (edit distance of 1). > That means, one adition, one remotion or one substitution. > > For that, my script receives a word (say, 'car') and generated all > possible additions and remotions, and substitutions: > > Additions: _car c_ar ca_r car_ > Substitutions: _ar c_r ca_ > remotions: ar cr ca > > Then, the script constructs an SQL query: > > SELECT DISTINCT(word) FROM dict WHERE word = "ar" OR word = "ca" OR > word LIKE "_car" OR word LIKE "c_r" OR word = "cr" OR word LIKE "_ar" > OR word LIKE "ca_r" OR word LIKE "c_ar" OR word LIKE "ca_" OR word > LIKE "car_"; > > And this SQL quer works... but not as quickly as I need (specially > because the speed is proportional to the word size).
My current solution is to make all combinations of words: having a list of letters, cycle them and substitute the underscore by that letter. The resulting list is being searched with SELECT word FROM dict WHERE word IN ('worda','wordb',wordc') and so on. While for big words this list can be big (it gets above 1000 easily) the query executes very fast. Hope this might be helpful for somebody. Cheers Alberto -- Alberto Simões _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users