Hello , we are facing performance regression in sqlite version 3.8 and higher 
versions on queries over r-tree tables and are unable to solve them , so we are 
asking professionals to consider severity of described problem: description is 
little Littler but , contains everything relevat.
we have 1 table containing set of nodes with their X,Y coordinates
Base_node_set (ID,X,Y)
Then we have r-tree table with bounding boxes
BBox_set_r_tree (ID,MAX_X,MIN_X,MAX_Y,MIN_Y)
then we want to select for all nodes all its bounding boxes , the node lies in
we do it  by executing this statement

select   *
from base_node_set as base
join BBox_set_r_tree as r_tree on
r_tree.MIN_X <= base.X and
                r_tree.MAX_X >= base.X and
                r_tree.MIN_Y <= base.Y and
r_tree.MAX_Y >= base.Y
order by base.ID

in prior versions of sqlite query planner query plan was:
ORDER                 DETAIL
1.                            SCAN TABLE base_node_set AS base USING INTEGER 
PRIMARY KEY (~1000000 rows)
2.                            SCAN TABLE BBox_set_r_tree AS r_tree VIRTUAL 
TABLE INDEX 2:BaDbBc (~0 rows)

On sqlite 3.8.0 and higher query plan of the same statement is little different
order                      DETAIL
1.                             SCAN TABLE osm_road_nodes_r_tree AS r_tree 
VIRTUAL TABLE INDEX 2:
2.                             SCAN TABLE sp_house_numbers AS base
0.                             0              USE TEMP B-TREE FOR ORDER BY

does first plan say?:
Scan nodes and for every node find its bboxes where the node lies (no step for 
order by is required because of use primary integer key in 1. scan)

Does second one say?
scan bboxes and for every bbox find nodes in it, then order result by nodes ID

Do I interpret such query plan correctly?

If so, I think that this is a bug in query planning with r_trees.
I dont know how to force new version of sqlite query planner to use old version 
plan.
Have tried to use CROSS JOIN instead of JOIN and the resulting plan was the 
same.
we now use the newest 3.8.0.2 and compile time options are:
                -DSQLITE_THREADSAFE=1
                -DSQLITE_ENABLE_MEMORY_MANAGEMENT=1
                -DSQLITE_ENABLE_RTREE=1
                -DSQLITE_ENABLE_STAT3=1
analyze has been run with no change to query plan,
this both tables contain milions of entries, but is more often that the rtree 
table is much more larger than node set table,
so it‘s our intention to reduce searches in r_tree table to minimum because of 
its size. We also ran such queries on less populated tables, but, the diference 
in query plans makes huge impact on performance on them too.
I have noticed 3.8.1 core function unlikely() ,but in such simple query cant 
find out how the unlikely() function use,
select from empty table and then cross join in right order from that tables did 
not changed query plan as well. What are other options to control query plans 
over rtree queries plans?
I am asking because , we were forced to upgrade  sqlite version in our 
application  from 3.7.6 because of occurences of disk I/O error codes in 
execution of such statements on larger datasets, we run our application on 
windows OS, after upgrade to 3.8.0 disk I/O errors dissapeared, but now with 
latest 3.8.0.2 we have completely different logic of query plans over rtree 
tables and are unable to rewrite them to be executed like in former versions of 
sqlite.  This is a case when NGQP find worst plan and execution time jumps from 
minutes to years.

Can anyone help solve this problem?
thank you for any reply
Best regards

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to