-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/03/2011 10:25 AM, Dotan Cohen wrote:
> I have seen this issue brought up on all types of software, from Anki
> to Kontact to Yum:

The way this is generally solved is to use a full text search engine.  The
various techniques are named things like ascii folding, stemming, synonyms
(wordnet) and double metaphone.  They'll also include support for spelling
correction, faceting, key terms, more like this and weighting.  You'll also
find lots of control over tokenization (eg if you see "d.r." what do you
turn it into?  what about "i-pad"?) and analysis.

The way most of them work is that you define a schema (a set of fields and
their types) and add "documents".  One of the fields will be an id which is
the key back into your real database.

What I suggest you do is use a full blown full text search engine and see
what features matter to you.  You can use triggers and user defined
functions to keep the FTS in sync with your content.

Then you can turn your request into one of these:

 - Adding support for FTS engine XYZ in SQLite (like how ICU is supported)

 - Please can someone write these features into SQLite's FTS.  In search XYZ
they are 9999 lines of code.

Some FTS engines to get you started:

Xapian - GPL, library, C++ core with lots of language bindings
Lucene - Apache license, library, Java core with some language bindings
Sphinx - GPL, server?, C++ core, pipes and db connectivity

If you are a Python developer then my favourite is the pure Python Whoosh.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3AZTYACgkQmOOfHg372QQlrQCeL5KOjgpx7Cx9OIBhmgE4zZt6
8QgAn0m03YREbaZL9aVcPCqZf8effjVR
=L/So
-----END PGP SIGNATURE-----
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to