You also need to make sure the "no hit" does not degenerate into a table scan.
RTree works well for this but is overall significantly slower than not using
RTree since the purpose of RTree is to find the "small number of candidate
records" that could possibly satisfy the query out of a haystack of records
(that is, find the candidate needles in the haystack, so that you only need to
closely examine that small number of candidates to find the needle rather than
test the whole haystack).
However, if you know that there can only be one possible record which can
satisfy the query (ie, there is only one possible needle in the haystack, and
only one possible candidate, and you can find this candidate directly for
testing), then the overhead of using RTree where it is not needed exceeds the
benefits of using it.
I see that the performance of the RTree is significantly slower than the
equivalent "direct" method. Am I doing something wrong here or is that
overhead simply because of the data structures that the RTRee implementation
must maintain (which are not required in this case).
Without RTree:
>py -3 test.py
Created 100000 random ranges in 00:00:00.681118 Creation Rate = 146817
Ranges/Second
Looked up 1102019 random range values in 00:00:04.598245 Lookup Rate = 239660
Values/Second
Failure Rate = 257270 Values/Second
Success Rate = 228828 Values/Second
With RTree:
>py -3 test.py --rtree
Created 100000 random ranges in 00:00:02.139742 Creation Rate = 46734
Ranges/Second
Looked up 1100681 random range values in 00:00:13.662556 Lookup Rate = 80561
Values/Second
Failure Rate = 119874 Values/Second
Success Rate = 65627 Values/Second
And that came from the following test program (in python) where the only
difference is the SQL statements being used. Because the ranges are random and
the lookups are random, the timings given are subject to differences on every
run, however, the averaged rates are relatively stable given a large number of
random ranges and query values.
--- test.py ---
from __future__ import absolute_import, division, print_function,
unicode_literals
import datetime
import random
import sys
import time
import sqlite3
# Convert a value in seconds to HMS format
HMS = lambda t: (datetime.datetime.min +
datetime.timedelta(seconds=t)).time().isoformat()
# Create constants for the SQL statements we will use
if '--rtree' in sys.argv:
create_sql = 'create virtual table ranges using rtree(id, start, stop,
+value);'
query_sql = 'select value from ranges where ? between start and stop;'
else:
create_sql = 'create table ranges (start integer primary key, stop integer
not null, value integer not null);'
query_sql = 'select value from (select stop, value from ranges where start
<= ?1 order by start desc limit 1) where ?1 <= stop;'
insert_sql = 'insert into ranges (start, stop, value) values (?, ?, ?);'
# Open our database and do not use automagical transactions
db = sqlite3.connect(':memory:', isolation_level=None)
# Create our table
db.execute(create_sql)
# Create our random range data
recs = 100000
start = 0
st = time.time()
for cnt in range(recs):
start += random.randint(1, 10)
stop = start + random.randint(1, 10)
value = int((start + stop) / 2)
db.execute(insert_sql, (start, stop, value))
start = stop
stop = stop + random.randint(1, 10)
et = time.time() - st
print('Created', recs, 'random ranges in', HMS(et), 'Creation Rate =', int(recs
/ et), 'Ranges/Second')
db.execute('analyze;')
# Generate a bunch of random values and perform the range query
eta = 0.0
ets = 0.0
etf = 0.0
fcnt = 0
scnt = 0
tcnt = 0
for i in range(stop):
x = random.randint(0, stop)
lst = time.time()
row = db.execute(query_sql, (x, )).fetchone()
let = time.time() - lst
if row:
value = row[0]
ets += let
scnt += 1
else:
value = None
etf += let
fcnt += 1
eta += let
tcnt += 1
print('Looked up', stop, 'random range values in', HMS(eta), 'Lookup Rate =',
int(tcnt / eta), 'Values/Second')
print('Failure Rate =', int(fcnt / etf), 'Values/Second')
print('Success Rate =', int(scnt / ets), 'Values/Second')
---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
>-----Original Message-----
>From: sqlite-users [mailto:sqlite-users-
>[email protected]] On Behalf Of E.Pasma
>Sent: Friday, 26 October, 2018 16:28
>To: SQLite mailing list
>Subject: Re: [sqlite] Optmize queries on ranges
>
>About the rtree extension, which was the first idea.
>
>The extension appears available without any special installation
>option. This is easier than what is mentioned in
>https://sqlite.org/rtree.html <https://sqlite.org/rtree.html> chapter
>2: "Compiling The R*Tree Module".
>This chapter may as well be left out?
>
>With test data where the ranges are mostly non-overlapping, the query
>now runs faster than without rtree. Even though both run within a
>millisecond rtree is ten times faster.
>With order by and limit the timing remains superior. But this relies
>on strictly non-overlapping ranges.
>Below my test script
>
>
>/* query 1: using rtree built-in extension */
>;
>create virtual table ranges using rtree(id, minX, maxX, +value);
>with r as (select 0 as r union all select r+1 from r where r<1000000)
>insert into ranges (minX, maxX, value)
>select r*10+1,r*10+10,r*10+5 from r
>;
>select value from ranges where 123456 between minx and maxx
>;
>123455
>Run Time: real 0.000 user 0.000135 sys 0.000018
>
>/* query 2: using index on minx+maxx */
>drop table ranges
>;
>create table ranges (minx int, maxx int, value int)
>;
>with r as (select 0 as r union all select r+1 from r where r<1000000)
>insert into ranges (minX, maxX, value)
>select r*10+1,r*10+10,r*10+5 from r
>;
>create unique index ranges_minx_maxx on ranges(minx,maxx)
>;
>select value from ranges where 123456 between minx and maxx
>;
>123455
>Run Time: real 0.002 user 0.001415 sys 0.000016
>
>/* query 3: same, assuming non-overlapping ranges */
>select value from ranges where 123456 between minx and maxx
>order by minx desc limit 1
>;
>123455
>Run Time: real 0.000 user 0.000057 sys 0.000000
>
>
>
>_______________________________________________
>sqlite-users mailing list
>[email protected]
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users