On Sun, Jun 25, 2006 at 07:54:13AM -0700, onemind wrote: > The thing is, i am going to need to use different letters each time to > search through over 200,000 words in a database and it needs to be fast.
The quick and dirty way to do this using a sqlite would be to keep a separate column from the work column with the letters in the word sorted. You would have to keep this column up to date yourself though. I do not believe there is an out-of-the-box way to populate this column with sqlite. To do so would involve a trigger and a user defined function. In other words you would have: sqlite> CREATE TABLE words (word text, sorted text); And inserting into the table : sqlite> insert into words(word,sorted) values ("quads","adqsu"); sqlite> insert into words(word,sorted) values ("quidas","adiqsu"); sqlite> insert into words(word,sorted) values ("queen","eenqu"); Then your searches would be in the vein of: sqlite> select word from words where sorted like "%a%d%q%s%"; quads quidas But, this would result in a full table scan for all searches and as a result be O(N) and would not be very efficient. Additionally it would probably be easier to just write a script that would do the same thing with a textfile of the words and search through it. Although... sounds like a fun little project... % ./init-db Creating words.db Inserting words from /usr/share/dict/words... Inserted 483523 words into db in 58.900854 seconds. % ./search-db aqds Searching for aqds -> SELECT word FROM words WHERE sorted LIKE '%a%d%q%s%' found 754 results in 1.521026 seconds I still wouldn't suggest this method, but it is a fun exercise. Each search in this manner is a full table scan. > What technology would be best suited for this task? I just assumed that a > databse would be ideal, why do you say sql isn't suited for this and what > is? Others have given good suggestions, Ulrick's bitset is particularly nice. Well, its a lazy sunday, lets see what happens. using the same approach as above, but using a 'bitset' column instead of 'sorted'. Also an index is created on the bitset column. % ./init-db-bitset Creating words-bitset.db Inserting words from /usr/share/dict/words... Inserted 483523 words into db in 75.368803 seconds. % ./search-db-bitset aqds Searching for words aqds -> SELECT word FROM words WHERE (bitset & 327689) = 327689 found 754 results in 0.902595 seconds Although I believe it should still have to do a table scan here. But bitwise and is faster than a string comparison in any case. Since this also does a full table scan, you'll probably want something more along the lines of an inverted index of the words by letter in some sort of dedicated data structure. I started playing with this using sqlite to try and do an inverted index by letter here, but it didn't get close to the performance of what the bitset was doing. Ahh, what a fun way to spend part a Sunday :-) enjoy, -jeremy -- ======================================================================== Jeremy Hinegardner [EMAIL PROTECTED]