On Oct 12, 2011, at 5:16 PM, Fabian wrote: > Why is this very fast (20 ms): > 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) > And this very slow (3500ms): > 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) > 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows)
The issue here is that offset works in term of your entire query. So for each row in table1 matching your where close, it's first going to do a join to table2, order the entire result set, skip the first half-a-million rows in the result set and then return the remaining 250 rows. A rather expensive proposition. Here is an example using two table: mail_header [1] and mail_header_text, a FTS table [2]. It's a one-to-one relationship. (0) Querying the count select count( * ) from mail_header where mail_header.header_id = 2 order by mail_header.id 0|0|0|SCAN TABLE mail_header (~219250 rows) CPU Time: user 0.690721 sys 0.064676 Ok, 83,391 rows at play. (1) Querying mail_header, with an offset explain query plan select mail_header.id from mail_header where mail_header.header_id = 2 order by mail_header.id limit 250 offset 50000; 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows) CPU Time: user 0.390615 sys 0.037031 Ok, we get 250 rows, after sorting 83,391 rows and skipping 50,000 of them. (2) Same, but with join to mail_header_text explain query plan select mail_header.id, mail_header_text.value from mail_header join mail_header_text on mail_header_text.docid = mail_header.id where mail_header.header_id = 2 order by mail_header.id limit 250 offset 50000; 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows) 0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows) CPU Time: user 2.153607 sys 0.265462 Note how it's an order of magnitude slower. This is because all these one-to-one joins on these 83K mail_header. They do have a cost. (3) Same, with a join, but with the offset factored out explain query plan select mail_header.id, mail_header_text.value from ( select mail_header.id from mail_header where mail_header.header_id = 2 order by mail_header.id limit 250 offset 50000 ) as mail_header join mail_header_text on mail_header_text.docid = mail_header.id; 1|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~219250 rows) 0|0|0|SCAN SUBQUERY 1 AS mail_header (~250 rows) 0|1|1|SCAN TABLE mail_header_text VIRTUAL TABLE INDEX 1: (~0 rows) CPU Time: user 0.402250 sys 0.039327 Now the join is performed only 250 times, adding just a small overhead compare the the bare bone query without the join. The short of it: minimize the amount of work upfront :) [1] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L271 [2] http://dev.alt.textdrive.com/browser/Mail/Mail.ddl#L260 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users