I cant go into too much detail because of my current job, but for fuzzy matching levenstien isnt very good, you need to try looking into ngram matching techniques, it is absolutely awesome in reducing over/under matches. Woody
--- On Sat, 7/5/08, Stephen Woodbridge <[EMAIL PROTECTED]> wrote: From: Stephen Woodbridge <[EMAIL PROTECTED]> Subject: Re: [sqlite] Fuzzy Matching To: "General Discussion of SQLite Database" <sqlite-users@sqlite.org> Date: Saturday, July 5, 2008, 11:24 PM Stephen Woodbridge wrote: > I would be interested in having something like this also. > > What I don't understand in your approach is how you compute the > (Levenstein) distance during a search. It seems like you have a fixed > set of tokens from your document text and these are indexed. Then you > have a query token the you want to compare to the index based on some > fuzzy distance. Since every query can be different I think you have to > compute the distance for every key in the index? that would require > doing a full index scan. > > If there ware a function that you could run a token through that would > given you that tokens "location" in some space then you could generate a > similar "location" for the query token and then use the rtree and > distance. I'm not aware of any such functions, but my expertise is more > in GIS the search searching. Hmmm, that was supposed to say text searching. > Thoughts? > > Best, > -Steve > > Martin Pfeifle wrote: >> Hi, I think there is nothing available except FTS. Doing a full table >> scan and computing for each string the (Levenstein) distance to the >> query object is too time consuming. So what I would like to see is >> the implementation of a generic metric index which needs as one >> parameter a metric distance function. Based on such a distance >> function you could then do similarity search on any objects , e.g. >> images, strings, etc. One possible index would be the M-tree (which >> you can also organize relational as it was done with the R*-tree). >> The idea is that you have a hierarchical index and each node is >> represented by a database object o and a covering radius r >> reflecting the maximal distance of all objects in that subtree to the >> object o. If you do a range query now, you compute the distance of >> your query object to the object o. If this distance minus the >> coverage radius r is bigger than your query range you can prune that >> subtree. You can either implement such a similarity module as an own >> extension similar toFTS or the Spatial module, or integrate it into >> FTS and use it only for strings. Personally, I need the second >> solution because I'd like to do full and fuzzy text search. Are there >> any plans to implement something like this, if yes, I would like to >> take part in such a development. . Best Martin >> >> >> >> >> ----- Ursprüngliche Mail ---- Von: Alberto Simões >> <[EMAIL PROTECTED]> An: General Discussion of SQLite Database >> <sqlite-users@sqlite.org> Gesendet: Donnerstag, den 3. Juli 2008, >> 21:52:05 Uhr Betreff: [sqlite] Fuzzy Matching >> >> Hello >> >> Although I am quite certain that the answer is that SQLite does not >> provide any mechanism to help me on this, it doesn't hurt to ask. Who >> know if anybody have any suggestion. >> >> Basically, I am using SQLite for a dictionary, and I want to let the >> user do fuzzy searches. OK, some simple Levenshtein distance of one >> or two would do the trick, probably. >> >> I imagine that SQLite (given the lite), does not provide any kind of >> nearmisses search. But probably, somebody here did anything similar >> in any language? >> >> Cheers Alberto > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users