-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/03/2011 10:25 AM, Dotan Cohen wrote: > I have seen this issue brought up on all types of software, from Anki > to Kontact to Yum:
The way this is generally solved is to use a full text search engine. The various techniques are named things like ascii folding, stemming, synonyms (wordnet) and double metaphone. They'll also include support for spelling correction, faceting, key terms, more like this and weighting. You'll also find lots of control over tokenization (eg if you see "d.r." what do you turn it into? what about "i-pad"?) and analysis. The way most of them work is that you define a schema (a set of fields and their types) and add "documents". One of the fields will be an id which is the key back into your real database. What I suggest you do is use a full blown full text search engine and see what features matter to you. You can use triggers and user defined functions to keep the FTS in sync with your content. Then you can turn your request into one of these: - Adding support for FTS engine XYZ in SQLite (like how ICU is supported) - Please can someone write these features into SQLite's FTS. In search XYZ they are 9999 lines of code. Some FTS engines to get you started: Xapian - GPL, library, C++ core with lots of language bindings Lucene - Apache license, library, Java core with some language bindings Sphinx - GPL, server?, C++ core, pipes and db connectivity If you are a Python developer then my favourite is the pure Python Whoosh. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk3AZTYACgkQmOOfHg372QQlrQCeL5KOjgpx7Cx9OIBhmgE4zZt6 8QgAn0m03YREbaZL9aVcPCqZf8effjVR =L/So -----END PGP SIGNATURE----- _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

