Re: [sqlite] Optimizing list retrieval with a FTS3 table

Travis Orr Thu, 17 Mar 2011 08:50:55 -0700

Scott, 

Thank you for clarifying the inefficiency of FTS3 when not using a MATCH
criteria. Unfortunately there are other use cases that do require the
MATCH criteria so the FTS3 is required.

I believe reading everything into a temp table would consume too much
memory as in an ideal situation we would not place an upper bound on the
number of songs that can be indexed. My next thought is to go ahead and
keep two separate tables with the same information. One a FTS3 table for
when searching by a word is necessary and another that does not use FTS3
for all other cases. Using TRIGGERS and VIEWS it should be relatively
simple to keep both tables up to date. Also the FTS3 version of the
table will not need all of the information that is in the main table
since not all columns need to be searchable.

In case you still cared, here is the current schema.

query = "CREATE VIRTUAL TABLE " LDB_ST " USING fts3 (
tokenize='unaccent', "
                                        LDB_ST_ID " INTEGER PRIMARY KEY,
"
                                        LDB_ST_FPATH " TEXT NOT NULL, "
                                        LDB_ST_TITLE " TEXT NOT NULL, "
                                        LDB_ST_ARTIST " TEXT NOT NULL, "
                                        LDB_ST_ALBUM " TEXT NOT NULL, "
                                        LDB_ST_TRACKNUM " INTEGER, "
                                        LDB_ST_GENRE " TEXT NOT NULL, "
                                        LDB_ST_BPM " REAL NOT NULL, "
                                        LDB_ST_TAPPED_BPM " REAL NOT
NULL, "
                                        LDB_ST_PLAYTIME " INTEGER NOT
NULL, "
                                        LDB_ST_COMMENTS " TEXT NOT NULL,
"
                                        LDB_ST_ISANALYSED " INTEGER NOT
NULL, "
                                        LDB_ST_SESSION " INTEGER, "
                                        LDB_ST_BPM_TYPE " INTEGER); ";

Thanks again,

Travis

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Scott Hess
Sent: March-16-11 4:20 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table

On Wed, Mar 16, 2011 at 12:15 PM, Travis Orr <[email protected]> wrote:
> Can someone enlighten me about this. To me a lot of the details appear

> to be hidden since my main SongTable is a FTS3 virtual table.

You don't provide your schema, but based on your queries, I'll make
unwarranted assumptions :-).

In fts3, there is a rowid column (standard SQLite meaning), a docid
column which is an alias of rowid, and all the columns you define are
TEXT columns.  If you say:

CREATE VIRTUAL TABLE MyTable USING FTS3(
  songid INTEGER PRIMARY KEY AUTO_INCREMENT MAGIC KEYWORDS,
  title VARCHAR(23),
  recorded DATETIME
);

All three of those columns are TEXT..  Based on your queries, I'm
betting that you're assuming that the various typing keywords for a
CREATE TABLE statement apply, but they don't.  If you want to know why,
you can scan the archives or read the source code, but suffice to say
that this is the truth at this time.

Anyhow, the gist of it is that the FTS3 table has a full-text index on
the TEXT of the columns, and that any other queries will be full table
scans, as if there were no optimizations at all.  So complicated queries
with ORDER BY, LIMIT, and OFFSET can absolutely destroy performance if
your result sets are all all big (or can be big, watch for the query of
death!).  If you will not be using MATCH, then there is no gain at all
from FTS3, and you should consider just using a regular table.

As I understand your problem, the solution I'd probably use would be to
create a new temporary table to hold the data while scanning it.
So something like:

CREATE TEMPORARY TABLE MyResults AS
  SELECT docid, title, artist FROM songtable WHERE ... ORDER BY ...;

I _think_ the resulting table will effectively capture the ORDER BY
results, so you can then scan it using OFFSET and LIMIT (or rowid)
efficiently.  If this is too big, you could experiment with capturing
only the docid values in order, and then joining MyResults back against
songtable to get the original values.  That won't be particularly
efficient with OFFSET and LIMIT, but it should be able to join directly
with songtable.docid, so it shouldn't be particularly inefficient,
either.

Of course, you could also just read the entire docid set into memory and
manage it that way.  It's a little cumbersome because then you have to
keep re-binding the query to walk through things, but it probably won't
perform any worse.

-scott
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Optimizing list retrieval with a FTS3 table

Reply via email to