The severe limitations on FTS3 seemed odd to me, but I figured I could live with them. Then I starting finding that various queries were giving strange "out of context" errors with the MATCH operator, even though I was following all the documented rules. As a result I started looking deeply into what is going on with FTS3 and I found something that bothers me.
These limitations are really completely arbitrary. They should be removable. You can only use a single index to query a table, after that everything else has to be done with a scan of the results, fair enough. But with FTS3, the match operator works ONLY when the match expression is selected for the index. This means that if a query could allow a row to be selected by either rowid, or a MATCH expression, you can have a problem. If the rowid is selected for use as the index, the MATCH won't be used as the index, and you get errors. Similarly, a query with two MATCH expressions will only be able to use one as the index, so you get errors from the second. Now, the reason this is arbitrary is that the MATCH expressions not used for the index are STILL given a chance to work at the function level. If a MATCH function were implemented, these limitations would disappear. Oddly enough, FTS3 doesn't expose a function for MATCH. Unfortunately, there is a good reason. It turns out that, as currently designed, testing an expression against a single known row requires a full table scan for every test. Inside my match function I would know the rowid (docid) for a record. While this can quickly look up the content, it is impossible to look up segments by document id, so checking the match on that row requires a lookup of all possible docids for the match expression, and a full scan of those results. Clearly this makes a function level match utter nonsense. My first question is, why was FTS designed like this in the first place? Surely this was clear during the design stage, when the design could have been easily changed to accommodate the lookups required for a MATCH function. Is there some compelling performance benefit? Something I missed? My second question is, can we expect this to change at some point? Just adding the MATCH function would eliminate virtually every remaining FTS limitation. All that is needed is the ability to lookup by a combination of docid and term. Isn't a hash already built while creating a list of terms for storage? What if that hash were stored, indexed by docid? For now I've modified my code to always index on the MATCH expression, if there is any. This at least eliminates the random errors, but does nothing wonderful for performance. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users