Re: [sqlite] Optmize queries on ranges

Keith Medcalf Thu, 25 Oct 2018 10:46:10 -0700

On Thursday, 25 October, 2018 10:48, Dan Kennedy <danielk1...@gmail.com> wrote:

>On 10/25/2018 11:13 PM, siscia wrote:

>> Hi all,

>> CREATE TABLE ranges (
>>     start int,
>>     end int,
>>     value int,
>> );

>> The query that I am interested in optimizing is "select value from
>> ranges where (? between start and end)"

>Your query is the same as "start <= ? AND end >= ?". The trouble is
>that SQlite can only use the index to optimize one of "start <= ?" 
>or "end >= ?". 

select value from ranges where ? between start and end;
is *almost* the same as
select value from ranges where start <= ?1 and ?1 <= end;

There is an extra column load and compare when using the between version of the 
query (this is because although the optimization of the index use is the same, 
the use of x BETWEEN y AND z adds both the y <= x and x <= z checks as where 
clause tests that are executed within the loop, whereas when using the devolved 
query (the later form) one of the constraints is used against the index and 
only the other one is tested.  

CREATE TABLE ranges 
(
    start integer,
    end integer,
    value integer
);
create index ranges_index on ranges (start, end, value);
insert into ranges (start, end, value) 
 select x.value - y.value, x.value + y.value, x.value 
   from generate_series as x 
   join generate_series as y 
  where x.start = 1 and x.stop = 10000 
    and y.start = 1 and y.stop = 10000;
-- count of ranges records is  10000000
select count(*) from ranges where 25 between start and end;
-- 50254399
-- Run Time: real 4.860 user 4.859375 sys 0.000000
select count(*) from ranges where start <= 25 and 25 <= end;
-- 50254399
-- Run Time: real 3.982 user 3.984375 sys 0.000000

Thus the extra column load and compare comprises 18% of the time used to 
execute the query.

It is slightly faster in if the table is a without rowid table, but not 
significantly (though the space used is halved).  Note however the overhead of 
using x BETWEEN y AND z -vs- y <= x AND x <= z is now 21% of the query 
execution time (which is probably not a significant difference) ...

create table ranges 
(
     start integer, 
     end integer, 
     value integer, 
     primary key (start, end)
) without rowid;
insert into ranges (start, end, value) 
     select x.value - y.value, x.value + y.value, x.value 
       from generate_series as x 
       join generate_series as y 
      where x.start = 1 and x.stop = 10000 
        and y.start = 1 and y.stop = 10000;
select count(*) from ranges where 25 between start and end;
-- 50254399
-- Run Time: real 4.854 user 4.843750 sys 0.000000
select count(*) from ranges where start <= 25 and 25 <= end;
-- 50254399
-- Run Time: real 3.814 user 3.812500 sys 0.000000

>And so you might be iterating through a very large set of records
>to extract the ones you want.

>R-tree might work for you:
>
>   https://sqlite.org/rtree.html

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Optmize queries on ranges

Reply via email to