Hi

I have a database using SQLite-3.7.14 with a FTS4 virtual table (Free
Text Search). The FTS table contains several millions of small documents.

The FTS DB is created on a server (where creating time does not matter)
and then used on an embedded device as a read-only database for FTS
queries (where speed of queries need to be as fast as possible and use
as little memory as possible).

Since the DB is used read-only on the device, I ran the the "optimize"
command (see http://www.sqlite.org/fts3.html#optimize) in the FTS index
after the last INSERT was done on the server.

Unfortunately, I found that the "optimize" command is creating very large
BLOBs in the database since it merges together all of inverted index b-trees
into very large complete b-tree. For frequent terms, merged BLOBs can be
up to ~50MB in my case in the ftsTable_segment table, as a result of
running "optimize":

  sqlite> SELECT *, length(block) AS len_blob
            FROM ftsTable_segments
            ORDER BY len_blob DESC LIMIT 3;

  seblockid|block|len_blob
  336808||51867353 <-- 51 MB BLOB!
  311724||19375541 <-- 19 MB BLOB
  334719||19223423 <-- 19 MB BLOB

Such large BLOBs (~50MB) are a problem in my case as they consume
a  large amount of memory on a embedded device, when doing FTS
queries with several frequent terms. SQLite memory high watermark
reported by sqlite3_memory_highwater() reaches ~200MB when query
contains several frequent terms, which is too much for an embedded
device, even though I set of soft memory limit of only 3MB using
sqlite3_soft_heap_limit64(...).

As a result, I have disabled running "optimize" on the FTS index
after creating the FTS DB on the server.

However, it got me thinking: it would be nice to be able to pass an
optional parameter to the FTS "optimize" command in order to
avoid merging b-trees when BLOBs reach a certain size?  In other
words, instead of doing...

INSERT INTO ftsTable(ftsTable) VALUES('optimize');

... I would like to be able to do something like this...

INSERT INTO ftsTable(ftsTable) VALUES('optimize=1048576');

... where optimize=1048576 indicates to *partially* optimize
reverse index b-trees in such a way that BLOBs do not exceed
1MB (1048577 bytes) in this example. It's OK if it's a fuzzy soft limit.


1/ Wouldn't such partial optimization of FTS index be useful?

2/ I also suggest that the documentation at
http://www.sqlite.org/fts3.html#optimize
    indicates that optimizing an FTS index can create very large BLOBs.
    This may be OK on a desktop or server, but it can be a problem
    on embedded devices with limited amount of memory.

Regards
Dominique
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to