Hi there,

I've got a very slow query that seems to be oh-so-close to running quickly, and 
it seems to come down to handling of OR operators.  When the query has one OR 
criteria, it uses the correct indexes, but when there are two, it does a table 
scan, which in this case brings the query time from milliseconds up to minutes 
in length when the database is large (~3GB).   I don't understand why it thinks 
it needs a table scan, and would love some suggestions.  This is with SQLite 
3.7.16.2.

It's a hierarchical data layout (as for a filesystem tree), and uses FTS4 
tables.

Here's a simplified schema (sorry I couldn't simplify it further):

CREATE TABLE files( key INTEGER DEFAULT 0 primary key, name TEXT COLLATE 
NOCASE, type INTEGER, parent INTEGER );
CREATE INDEX files_parent_idx_idx ON files (parent);

CREATE TABLE search_hierarchy( parent INTEGER REFERENCES files(key), child 
INTEGER REFERENCES files(key) );
CREATE INDEX search_hierarchy_parent_idx ON search_hierarchy (parent);
CREATE INDEX search_hierarchy_child_idx ON search_hierarchy (child);
CREATE VIRTUAL TABLE filename_fts USING fts4( file_name_fts );
CREATE VIRTUAL TABLE file_comment_metadata USING fts4( fileComment );

And the query in question:

SELECT files.key FROM files, search_hierarchy, filename_fts, 
file_comment_metadata
WHERE search_hierarchy.child = files.parent AND filename_fts.rowid = files.key 
AND file_comment_metadata.rowid = files.key AND
( search_hierarchy.parent = 12 OR files.parent = 12 ) AND 

( (filename_fts.file_name_fts MATCH '"X*"') OR 
(file_comment_metadata.fileComment MATCH '"X*"') ) LIMIT 0, 1000;


This gets run as 

0|0|1|SCAN TABLE search_hierarchy (~1000000 rows) <<<<<
0|1|0|SEARCH TABLE files USING COVERING INDEX files_parent_idx_idx (parent=?) 
(~5 rows)
0|2|2|SCAN TABLE filename_fts VIRTUAL TABLE INDEX 1: (~0 rows)
0|3|3|SCAN TABLE file_comment_metadata VIRTUAL TABLE INDEX 1: (~0 rows)

It's that first SCAN TABLE search_hierarchy that is killing it (~3 minutes for 
this query).  It seems to be the OR's on the last two lines that seem to throw 
it for a loop.

If I change it to the following, by removing the first OR:
SELECT files.key FROM files, search_hierarchy, filename_fts, 
file_comment_metadata
WHERE search_hierarchy.child = files.parent AND filename_fts.rowid = files.key 
AND file_comment_metadata.rowid = files.key AND
( search_hierarchy.parent = 12  ) AND 

( (filename_fts.file_name_fts MATCH '"X*"') OR 
(file_comment_metadata.fileComment MATCH '"X*"') ) LIMIT 0, 1000;


It runs very fast (75ms) as
0|0|1|SEARCH TABLE search_hierarchy USING INDEX search_hierarchy_parent_idx 
(parent=?) (~10 rows)
0|1|0|SEARCH TABLE files USING COVERING INDEX files_parent_idx_idx (parent=?) 
(~10 rows)
0|2|2|SCAN TABLE filename_fts VIRTUAL TABLE INDEX 1: (~0 rows)
0|3|3|SCAN TABLE file_comment_metadata VIRTUAL TABLE INDEX 1: (~0 rows)

Or if change it to:

SELECT files.key FROM files, search_hierarchy, filename_fts, 
file_comment_metadata
WHERE search_hierarchy.child = files.parent AND filename_fts.rowid = files.key 
AND file_comment_metadata.rowid = files.key AND
( search_hierarchy.parent = 12 OR files.parent = 12 ) AND 

( (filename_fts.file_name_fts MATCH '"X*"') ) LIMIT 0, 1000;


it runs very quickly (~70ms) as well, explained as 
0|0|2|SCAN TABLE filename_fts VIRTUAL TABLE INDEX 2: (~0 rows)
0|1|0|SEARCH TABLE files USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|3|SCAN TABLE file_comment_metadata VIRTUAL TABLE INDEX 1: (~0 rows)
0|3|1|SEARCH TABLE search_hierarchy USING INDEX search_hierarchy_child_idx 
(child=?) (~5 rows)

Are there any suggestions to eliminate the table scan in the original query, 
since it seems quite capable of efficiently handling each of the OR expressions 
on their own.  (I could do a UNION of multiple versions of the whole query with 
each OR section separated out, but that's not terribly satisfying.)   


Many thanks!
-Paul
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to