In our application, we have an FTS4 table 'tmexpandedtext' that contains all our text strings. Other tables have columns like fullname_id, path_id and filename_id that point to strings in this table by their docid. The issue we're seeing is that joining on the docid column seems really slow compared to doing the join on an otherwise equivalent non-FTS4 table.
My question is twofold: Am I wrong to think that joining on docid should be as fast as joining on indexed integer columns in other tables? And if I'm indeed wrong about that, what's the recommended way to speed things up? Here's a transcript to show what's going on: sqlite> .timer on sqlite> .output foo.txt sqlite> SELECT msg.tsc, msg.flags FROM tmmessages msg LEFT JOIN tmexpandedtext etext ON etext.docid=msg.fullname_id LEFT JOIN tmexpandedtext ptext ON ptext.docid=msg.path_id LEFT JOIN tmexpandedtext ftext ON ftext.docid=msg.filename_id; CPU Time: user 31.106599 sys 0.062400 sqlite> .output stdout sqlite> explain query plan SELECT msg.tsc, msg.flags FROM tmmessages msg LEFT JOIN tmexpandedtext etext ON etext.docid=msg.fullname_id LEFT JOIN tmexpandedtext ptext ON ptext.docid=msg.path_id LEFT JOIN tmexpandedtext ftext ON ftext.docid=msg.filename_id; 0|0|0|SCAN TABLE tmmessages AS msg (~1000000 rows) 0|1|1|SCAN TABLE tmexpandedtext AS etext VIRTUAL TABLE INDEX 1: (~0 rows) 0|2|2|SCAN TABLE tmexpandedtext AS ptext VIRTUAL TABLE INDEX 1: (~0 rows) 0|3|3|SCAN TABLE tmexpandedtext AS ftext VIRTUAL TABLE INDEX 1: (~0 rows) CPU Time: user 0.000000 sys 0.000000 The first scan is okay since in this example we're grabbing everything in the tmmessages and resolving the three id columns to the corresponding strings in tmexpandedtext. Normally I would say the three other scans are suspicious, but from my reading I've been let to believe that a SCAN TABLE with VIRTUAL TABLE INDEX is the moral equivalent of a SEARCH TABLE on a non-FTS4 table. However, the slowness of the query seems to say otherwise. For reference, here are the sizes of the two relevant tables: sqlite> select count() from tmexpandedtext; 70154 CPU Time: user 0.031200 sys 0.000000 sqlite> select count() from tmmessages; 1100006 CPU Time: user 0.000000 sys 0.031200 Now let's create a non-FTS4 equivalent of the tmmessages table and re-do the query with it: sqlite> create table strings (content VARCHAR, docid INTEGER); CPU Time: user 0.000000 sys 0.000000 sqlite> create index strings_docid ON strings(docid); CPU Time: user 0.000000 sys 0.000000 sqlite> INSERT INTO strings SELECT content, docid FROM tmexpandedtext; CPU Time: user 0.187201 sys 0.000000 sqlite> SELECT msg.tsc, msg.flags FROM tmmessages msg LEFT JOIN strings etext ON etext.docid=msg.fullname_id LEFT JOIN strings ptext ON ptext.docid=msg.path_id LEFT JOIN strings ftext ON ftext.docid=msg.filename_id; CPU Time: user 3.572423 sys 0.031200 As you can see, it is much faster and more in line with what you'd expect. Here is the query plan: sqlite> explain query plan SELECT msg.tsc, msg.flags FROM tmmessages msg LEFT JOIN strings etext ON etext.docid=msg.fullname_id LEFT JOIN strings ptext ON ptext.docid=msg.path_id LEFT JOIN strings ftext ON ftext.docid=msg.filename_id; 0|0|0|SCAN TABLE tmmessages AS msg (~1000000 rows) 0|1|1|SEARCH TABLE strings AS etext USING COVERING INDEX strings_docid (docid=?) (~10 rows) 0|2|2|SEARCH TABLE strings AS ptext USING COVERING INDEX strings_docid (docid=?) (~10 rows) 0|3|3|SEARCH TABLE strings AS ftext USING COVERING INDEX strings_docid (docid=?) (~10 rows) CPU Time: user 0.000000 sys 0.000000 Any advice or insight would be greatly appreciated! Sincerely, Per Vognsen _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users