Re: [sqlite] Why FTS3 has the limitations it does

Scott Hess Mon, 19 Oct 2009 09:51:25 -0700

On Sat, Oct 17, 2009 at 1:25 PM, John Crenshaw <johncrens...@priacta.com> wrote:
> Agreed, HUGE thanks for FTS. Hopefully my original post didn't
> come off ungrateful.  I was just confused by limitations that
> looked like they could have been removed during the initial
> design (at least more easily than they can now.) Scott's reply
> helps me understand this better, and perhaps gives some
> starting points for finding a solution.


One of the things I found challenging about fts development was that
being embedded w/in SQLite made some problems harder.  You can't just
spin up a book-keeping thread to do stuff in the background, and you
can't easily expose a grungy API to let the client do it, either.
Plus you have the issues of shipping a framework (such as not being
able to arbitrarily change the file format on a whim, even if it's
WRONG).  This meant that in many cases I was a bit aggressive in
pruning features up front, to scope things appropriately, and once
committed to a file format some things just couldn't be added.

> The idea of using the tokenizer output and doing a direct match
> is intriguing. A full content scan is expensive (that is the
> point of indexing,) but guess this is usually less expensive
> than a full index scan for single rows (especially for large
> indexes), and would eliminate the current limitations.

Doing an fts index which can handle subset scans efficiently is going
to be hard.  Like a lot of systems fts3 uses segments to keep index
updates manageable, but this means that you can't just do a single
b-tree intersection, you have to look at multiple b-trees, so you'll
end up hitting a greater fraction of the index footprint to do the
query.  You could get a CPU win by having the code at least not keep
more of the doclist data than needed around.

One thing I had been considering adding was some stats data so that
you could easily determine the magnitude of the doclist for a term.
In this case, if that info suggested that the index wasn't much bigger
than the subset of interest, use the index, otherwise use a content
scan.

> Supposing someone wanted to update FTS3, how would they get
> write access to the main code repository?

That's for the SQLite team (I've been pretty quiet on that front,
lately, so will not speak for them).

-scott
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Why FTS3 has the limitations it does

Reply via email to