-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25/01/13 12:59, Paul Vercellotti wrote:
> As I understand, it's tricky to get FTS to do substring matching, no?
> What's the best way to do that?

In what way is it tricky?  There are several examples of doing it in the
doc I pointed to.  Even when it does a full scan the list of all words
should be shorter than visiting each source row.

I recommend you actually go ahead and use FTS before deciding it doesn't
work.  You'll be able to get accurate performance information for your
data set.

  http://c2.com/cgi/wiki?PrematureOptimization

If you want to do substring matching using an index then you need to use
n-grams.  This involves taking fragments from the text.  For example if
your source text is "hi there" and you are doing n-grams between 2 and 4
letters then you would index these:

  'hi' 'hi ' 'hi t' 'i ' 'i t' 'i th' ' t' ' th' ' the' 'th'
  'the' 'ther' 'he' 'her' 'here' 'er' 'ere' 're'

You can possibly also use a FTS tokenizer that produces n-grams.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEDFdwACgkQmOOfHg372QTShgCfXMmtiWFbWL9INRMF4TfTUTGb
5+IAn2LrTYKTm9mLcJ6mR6piRQ8LT6nw
=taL+
-----END PGP SIGNATURE-----
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to