On Sat, Oct 17, 2009 at 1:25 PM, John Crenshaw <johncrens...@priacta.com> wrote: > Agreed, HUGE thanks for FTS. Hopefully my original post didn't > come off ungrateful. I was just confused by limitations that > looked like they could have been removed during the initial > design (at least more easily than they can now.) Scott's reply > helps me understand this better, and perhaps gives some > starting points for finding a solution.
One of the things I found challenging about fts development was that being embedded w/in SQLite made some problems harder. You can't just spin up a book-keeping thread to do stuff in the background, and you can't easily expose a grungy API to let the client do it, either. Plus you have the issues of shipping a framework (such as not being able to arbitrarily change the file format on a whim, even if it's WRONG). This meant that in many cases I was a bit aggressive in pruning features up front, to scope things appropriately, and once committed to a file format some things just couldn't be added. > The idea of using the tokenizer output and doing a direct match > is intriguing. A full content scan is expensive (that is the > point of indexing,) but guess this is usually less expensive > than a full index scan for single rows (especially for large > indexes), and would eliminate the current limitations. Doing an fts index which can handle subset scans efficiently is going to be hard. Like a lot of systems fts3 uses segments to keep index updates manageable, but this means that you can't just do a single b-tree intersection, you have to look at multiple b-trees, so you'll end up hitting a greater fraction of the index footprint to do the query. You could get a CPU win by having the code at least not keep more of the doclist data than needed around. One thing I had been considering adding was some stats data so that you could easily determine the magnitude of the doclist for a term. In this case, if that info suggested that the index wasn't much bigger than the subset of interest, use the index, otherwise use a content scan. > Supposing someone wanted to update FTS3, how would they get > write access to the main code repository? That's for the SQLite team (I've been pretty quiet on that front, lately, so will not speak for them). -scott _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users