On Wed, Oct 14, 2009 at 11:35 PM, John Crenshaw
<johncrens...@priacta.com> wrote:
> The severe limitations on FTS3 seemed odd to me, but I figured I could
> live with them. Then I starting finding that various queries were giving
> strange "out of context" errors with the MATCH operator, even though I
> was following all the documented rules. As a result I started looking
> deeply into what is going on with FTS3 and I found something that
> bothers me.
>
> These limitations are really completely arbitrary. They should be
> removable.

fts is mostly the way it is because that was the amount that got done
before I lost the motivation to carry it further.  The set of possible
improvements is vast, but they need a motivated party to carry them
forward.  Some of the integration with SQLite is the way it is mostly
because it was decided to keep fts outside of SQLite core.  Feel free
to dive in and improve it.

> You can only use a single index to query a table, after that everything
> else has to be done with a scan of the results, fair enough. But with
> FTS3, the match operator works ONLY when the match expression is
> selected for the index. This means that if a query could allow a row to
> be selected by either rowid, or a MATCH expression, you can have a
> problem. If the rowid is selected for use as the index, the MATCH won't
> be used as the index, and you get errors. Similarly, a query with two
> MATCH expressions will only be able to use one as the index, so you get
> errors from the second.

The MATCH code probes term->doclist, there is no facility for probing
by docid.  At minimum the document will need to be tokenized.
Worst-case, you could tokenize it to an in-memory segment and probe
that, which would make good re-use of existing code.  Most efficient
would be to somehow match directly against the tokenizer output (you
could look at the snippeting code for hints there).

> My first question is, why was FTS designed like this in the first place?

Because running MATCH against a subset of the table was not considered
an important use case when designing it?

> Surely this was clear during the design stage, when the design could
> have been easily changed to accommodate the lookups required for a MATCH
> function. Is there some compelling performance benefit? Something I
> missed?

"Easily" is all relative.  There were plenty of hard problems to be
solved without looking around for a bunch of easy ones to tack on.

> My second question is, can we expect this to change at some point?

Probably not unless someone out there decides to.  I got kind of
burned out on fts about a year back.

> All that is needed is the ability to lookup by a combination
> of docid and term. Isn't a hash already built while creating a list of
> terms for storage? What if that hash were stored, indexed by docid?

In database world, space==time.  Storing more data means the system gets slower.

-scott
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to