[sqlite] Why FTS3 has the limitations it does

John Crenshaw Wed, 14 Oct 2009 23:36:29 -0700

The severe limitations on FTS3 seemed odd to me, but I figured I could
live with them. Then I starting finding that various queries were giving
strange "out of context" errors with the MATCH operator, even though I
was following all the documented rules. As a result I started looking
deeply into what is going on with FTS3 and I found something that
bothers me.


 

These limitations are really completely arbitrary. They should be
removable.

 

You can only use a single index to query a table, after that everything
else has to be done with a scan of the results, fair enough. But with
FTS3, the match operator works ONLY when the match expression is
selected for the index. This means that if a query could allow a row to
be selected by either rowid, or a MATCH expression, you can have a
problem. If the rowid is selected for use as the index, the MATCH won't
be used as the index, and you get errors. Similarly, a query with two
MATCH expressions will only be able to use one as the index, so you get
errors from the second.

 

Now, the reason this is arbitrary is that the MATCH expressions not used
for the index are STILL given a chance to work at the function level. If
a MATCH function were implemented, these limitations would disappear.
Oddly enough, FTS3 doesn't expose a function for MATCH. Unfortunately,
there is a good reason. It turns out that, as currently designed,
testing an expression against a single known row requires a full table
scan for every test. Inside my match function I would know the rowid
(docid) for a record. While this can quickly look up the content, it is
impossible to look up segments by document id, so checking the match on
that row requires a lookup of all possible docids for the match
expression, and a full scan of those results. Clearly this makes a
function level match utter nonsense.

 

My first question is, why was FTS designed like this in the first place?
Surely this was clear during the design stage, when the design could
have been easily changed to accommodate the lookups required for a MATCH
function. Is there some compelling performance benefit? Something I
missed?

 

My second question is, can we expect this to change at some point? Just
adding the MATCH function would eliminate virtually every remaining FTS
limitation. All that is needed is the ability to lookup by a combination
of docid and term. Isn't a hash already built while creating a list of
terms for storage? What if that hash were stored, indexed by docid?

 

For now I've modified my code to always index on the MATCH expression,
if there is any. This at least eliminates the random errors, but does
nothing wonderful for performance.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Why FTS3 has the limitations it does

Reply via email to