On Wed, Oct 14, 2009 at 11:35 PM, John Crenshaw <johncrens...@priacta.com> wrote: > The severe limitations on FTS3 seemed odd to me, but I figured I could > live with them. Then I starting finding that various queries were giving > strange "out of context" errors with the MATCH operator, even though I > was following all the documented rules. As a result I started looking > deeply into what is going on with FTS3 and I found something that > bothers me. > > These limitations are really completely arbitrary. They should be > removable.
fts is mostly the way it is because that was the amount that got done before I lost the motivation to carry it further. The set of possible improvements is vast, but they need a motivated party to carry them forward. Some of the integration with SQLite is the way it is mostly because it was decided to keep fts outside of SQLite core. Feel free to dive in and improve it. > You can only use a single index to query a table, after that everything > else has to be done with a scan of the results, fair enough. But with > FTS3, the match operator works ONLY when the match expression is > selected for the index. This means that if a query could allow a row to > be selected by either rowid, or a MATCH expression, you can have a > problem. If the rowid is selected for use as the index, the MATCH won't > be used as the index, and you get errors. Similarly, a query with two > MATCH expressions will only be able to use one as the index, so you get > errors from the second. The MATCH code probes term->doclist, there is no facility for probing by docid. At minimum the document will need to be tokenized. Worst-case, you could tokenize it to an in-memory segment and probe that, which would make good re-use of existing code. Most efficient would be to somehow match directly against the tokenizer output (you could look at the snippeting code for hints there). > My first question is, why was FTS designed like this in the first place? Because running MATCH against a subset of the table was not considered an important use case when designing it? > Surely this was clear during the design stage, when the design could > have been easily changed to accommodate the lookups required for a MATCH > function. Is there some compelling performance benefit? Something I > missed? "Easily" is all relative. There were plenty of hard problems to be solved without looking around for a bunch of easy ones to tack on. > My second question is, can we expect this to change at some point? Probably not unless someone out there decides to. I got kind of burned out on fts about a year back. > All that is needed is the ability to lookup by a combination > of docid and term. Isn't a hash already built while creating a list of > terms for storage? What if that hash were stored, indexed by docid? In database world, space==time. Storing more data means the system gets slower. -scott _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users