Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

RP McMurphy Tue, 11 Nov 2014 05:06:58 -0800

> <http://www.sqlite.org/lang_analyze.html>


Okay, for my simplified example analyze does improve the times. But for our 
application this does not help, and it also harmed a couple of other queries. 
I'll have to see if I can make a better example schema showing the problem 
because I can't upload the entire 900MB database.

RP

--------------------------------------------
On Sun, 11/9/14, RP McMurphy <rpm0...@yahoo.com> wrote:

 Subject: Discrepancy with indexing and WHERE clause with AND/OR
 To: sqlite-users@sqlite.org
 Date: Sunday, November 9, 2014, 2:35 AM
 
 If there is a large table and we need
 to select a subset of values using a WHERE clause with an
 AND/OR construct sqlite has trouble finding the answer in a
 reasonable time. Breaking the queries down into separate
 SELECT statements speeds up the process exponentially.
 
 For example the following takes a few seconds to return the
 answer 1334. Note that the index (w) is a "low quality"
 index with the arguments in the wrong order. The reason for
 this is explained further down:
 
     with recursive cnt(x) as (select 1 union
 all select x+1 from cnt limit 10000000)
     insert into v select x % 3,x from cnt;
 
     create index w on v(z,y);
     select count(*) from v
 where    z = 0 and
            
     (    y between 1000000 and
 1001000 
            
     or    y between 2000000
 and 2001000
            
     or    y between 3000000
 and 3001000
            
     or    y between 4000000
 and 4001000);
 
 The reason the the "low quality" index is because this data
 is also accessed in a different manner. Namely like this:
 
     select count(*) from v group by z;
 
 Both of the above queries each take about 3 seconds to run.
 I don't think the second query can be made faster, but the
 first query can certainly be much faster even with the "low
 quality" index. Thus:
 
     select 
     (select count(*) from v where z = 0 and y
 between 1000000 and 1001000) +
     (select count(*) from v where z = 0 and y
 between 2000000 and 2001000) +
     (select count(*) from v where z = 0 and y
 between 3000000 and 3001000) +
     (select count(*) from v where z = 0 and y
 between 4000000 and 4001000);
 
 Now the query returns the result 1334 almost immediately.
 The only difference is that the WHERE clause has been
 manually flattened and broken into separate SELECT
 portions.
 
 When we change the index to "high quality" (u) and put the
 arguments in the other order.
 
     drop index w;
     create index u on v(y,z);
 
 And rerun the query:
 
     select count(*) from v
 where    z = 0 and
            
     (    y between 1000000 and
 1001000 
            
     or    y between 2000000
 and 2001000
            
     or    y between 3000000
 and 3001000
            
     or    y between 4000000
 and 4001000);
 
 The answer 1334 is returned almost immediately. And
 flattening this query gives no advantage. But now the second
 type of query runs terribly slow because the index is very
 poor for this type of query:
 
     select count(*) from v group by z;
 
 And takes more then 30 seconds to finish.
 
 Trying to make two indices in the hope that sqlite will find
 the optimal one by itself gives back result times the same
 as if only the w index is present:
 
     create index w on v(z,y);
     create index u on v(y,z);
 
 Both queries now take a few seconds to run. It is as if the
 u index does not exist.
 
 In our application we have only created the w index (since
 the u index is awful for the grouping query) and manually
 generate the WHERE/AND/OR flattened queries in a loop. This
 solution is unsatisfying to us and I think we must be doing
 something wrong. Is there a way we can make the w index work
 with both queries and not have to run external loops to
 flatten all the WHERE clauses?
 
 RP
 
 PS: Below is the text in one unit that can be copied and
 pasted into a shell session running sqlite3.exe:
 
 create table times(idx,j);
 create table v(z,y);
 
 with recursive cnt(x) as (select 1 union all select x+1 from
 cnt limit 10000000)
 insert into v select x % 3,x from cnt;
 
 create index w on v(z,y);
 select 'With index (z,y)';
 insert into times select 0,julianday('now');
 select 
 (select count(*) from v where z = 0 and y between 1000000
 and 1001000) +
 (select count(*) from v where z = 0 and y between 2000000
 and 2001000) +
 (select count(*) from v where z = 0 and y between 3000000
 and 3001000) +
 (select count(*) from v where z = 0 and y between 4000000
 and 4001000);
 insert into times select 1,julianday('now');
 select strftime('%f',(select j from times where
 idx=1)-(select j from times where idx=0));
 select count(*) from v where    z = 0 and
            
 (    y between 1000000 and 1001000 
            
 or    y between 2000000 and 2001000
            
 or    y between 3000000 and 3001000
            
 or    y between 4000000 and 4001000);
 insert into times select 2,julianday('now');
 select strftime('%f',(select j from times where
 idx=2)-(select j from times where idx=1));
 select count(*) from v group by z;
 insert into times select 3,julianday('now');
 select strftime('%f',(select j from times where
 idx=3)-(select j from times where idx=2));
 
 drop index w;
 delete from times;
 
 create index u on v(y,z);
 select 'With index (y,z)';
 insert into times select 0,julianday('now');
 select 
 (select count(*) from v where z = 0 and y between 1000000
 and 1001000) +
 (select count(*) from v where z = 0 and y between 2000000
 and 2001000) +
 (select count(*) from v where z = 0 and y between 3000000
 and 3001000) +
 (select count(*) from v where z = 0 and y between 4000000
 and 4001000);
 insert into times select 1,julianday('now');
 select strftime('%f',(select j from times where
 idx=1)-(select j from times where idx=0));
 select count(*) from v where    z = 0 and
            
 (    y between 1000000 and 1001000 
            
 or    y between 2000000 and 2001000
            
 or    y between 3000000 and 3001000
            
 or    y between 4000000 and 4001000);
 insert into times select 2,julianday('now');
 select strftime('%f',(select j from times where
 idx=2)-(select j from times where idx=1));
 select count(*) from v group by z;
 insert into times select 3,julianday('now');
 select strftime('%f',(select j from times where
 idx=3)-(select j from times where idx=2));
 
 delete from times;
 
 create index w on v(z,y);
 select 'With both indices (y,z) and (z,y)';
 insert into times select 0,julianday('now');
 select 
 (select count(*) from v where z = 0 and y between 1000000
 and 1001000) +
 (select count(*) from v where z = 0 and y between 2000000
 and 2001000) +
 (select count(*) from v where z = 0 and y between 3000000
 and 3001000) +
 (select count(*) from v where z = 0 and y between 4000000
 and 4001000);
 insert into times select 1,julianday('now');
 select strftime('%f',(select j from times where
 idx=1)-(select j from times where idx=0));
 select count(*) from v where    z = 0 and
            
 (    y between 1000000 and 1001000 
            
 or    y between 2000000 and 2001000
            
 or    y between 3000000 and 3001000
            
 or    y between 4000000 and 4001000);
 insert into times select 2,julianday('now');
 select strftime('%f',(select j from times where
 idx=2)-(select j from times where idx=1));
 select count(*) from v group by z;
 insert into times select 3,julianday('now');
 select strftime('%f',(select j from times where
 idx=3)-(select j from times where idx=2));
 
 .quit
 
 
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Discrepancy with indexing and WHERE clause with AND/OR

Reply via email to