I want to build an application that can search for specific documents based on a lot of criteria from a user (e.g. matching keywords, text, etc...). And then to combine these results using boolean logic ... For example, keyword 'animal' AND rating > 3
Each document has an integer id. My strategy so far is to gather the id for each subquery into a temporary table. Then I can combine the tables using UNION or INTERSECT keywords. However, when I try a sample that does this on 250,000 rows, I find the performance is rather slow: 2 seconds. (I've built sqlite3 myself on MacOS) sqlite3 --version > 3.8.10.2 2015-05-20 18:17:19 2ef4f3a5b1d1d0c4338f8243d40a2452cc1f7fe4 > cat config.sql intersection.sql | sqlite3 > 250001 > Run Time: real 2.000 user 1.956734 sys 0.044040 > (samples files included below.) I wonder if anyone has suggestions for how to improve this approach. For example, I've considered trying to build a monolithic query which does the subqueries and UNION/INTERSECTION logic all at once, but my previous experience has shown that the resulting queries are very complex and hard for me to reason about if/when they get slow. This approach lets me profile each subquery easily. It also lets me tackle sorting the results as a separate step. Another idea I've toyed with is building a custom implementation of UNION / INTERSECTION for result sets that are just sets of integers. I could do this as a virtual table in sqlite. Thanks for your thoughts on this problem. Andy config.sql contains: CREATE TABLE config( numAssets ); INSERT INTO config VALUES( 250000 ); intersection.sql contains: CREATE TEMPORARY TABLE idset1 ( id PRIMARY KEY ); CREATE TEMPORARY TABLE idset2 ( id PRIMARY KEY ); CREATE INDEX idset1_id ON idset1 ( id ); CREATE INDEX idset2_id ON idset2 ( id ); BEGIN TRANSACTION; INSERT INTO idset1 SELECT id FROM ( WITH RECURSIVE cnt( id ) AS ( VALUES( 0 ) UNION ALL SELECT id+1 FROM cnt WHERE id < ( SELECT MAX( numAssets ) FROM config )) select * from cnt ), config; COMMIT TRANSACTION; BEGIN TRANSACTION; INSERT INTO idset2 SELECT id + 1000000 FROM ( WITH RECURSIVE cnt( id ) AS ( VALUES( 0 ) UNION ALL SELECT id+1 FROM cnt WHERE id < ( SELECT MAX( numAssets ) FROM config )) select * from cnt ), config; COMMIT TRANSACTION; .timer ON SELECT count(id) FROM idset1 WHERE id IN ( SELECT id FROM idset1 UNION SELECT id FROM idset2 ); .timer OFF