On Nov 9, 2011, at 10:24 PM, Fabian wrote:

> It seems that FTS doesn't need to read the whole index from
> disk, so I'm trying to pinpoint the difference. My best guess is that it
> creates a fresh b-tree for the additional inserts, causing the boost in
> performance.

Indeed.

Quoting the fine manual:

"Multiple b-tree structures are used instead of a single b-tree to reduce the 
cost of inserting records into FTS tables. When a new record is inserted into 
an FTS table that already contains a lot of data, it is likely that many of the 
terms in the new record are already present in a large number of existing 
records. If a single b-tree were used, then large doclist structures would have 
to be loaded from the database, amended to include the new docid and 
term-offset list, then written back to the database. Using multiple b-tree 
tables allows this to be avoided by creating a new b-tree which can be merged 
with the existing b-tree (or b-trees) later on. Merging of b-tree structures 
can be performed as a background task, or once a certain number of separate 
b-tree structures have been accumulated. Of course, this scheme makes queries 
more expensive (as the FTS code may have to look up individual terms in more 
than one b-tree and merge the results), but it has been found that in 
 practice this overhead is often negligible."

http://www.sqlite.org/fts3.html#section_8
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to