Hi,

This post concerns a strange (imho) behaviour of recursive WITH update query as far as execution time is concerned.

I have created (in an "In memory" DB) a table that stores a set of nodes arranged in a tree structure (actually the skeleton of the contents of a Windows folder):

   create table GRAPH(Parent int, Child int, Level int);

Column "Level" indicates the depth of the node in the tree. The root node has Level 0, its children Level 1, and so on.

2690 nodes have been inserted, with "Level" initialized to null:

insert into GRAPH(Parent,Child) values (null,1); -- root node (no parent) insert into GRAPH(Parent,Child) values (1,9); -- child of the root node insert into GRAPH(Parent,Child) values (9,10); -- grand-child of the root node
   insert into GRAPH(Parent,Child) values (9,11);
   insert into GRAPH(Parent,Child) values (9,12);
   insert into GRAPH(Parent,Child) values (9,13);
   etc.

Considering the size of the table, no indexes have been created.
The distribution of nodes among the levels is fairly "normal":

   +-------+--------+
   | Level | Number |
   +-------+--------+
   | 0     | 1      |
   | 1     | 47     |
   | 2     | 215    |
   | 3     | 638    |
   | 4     | 1010   |
   | 5     | 729    |
   | 6     | 50     |
   +-------+--------+

Now, I would like to compute the "Level" value of all nodes from their position in the tree. This task immediately suggests a recursive WITH update:

   with recursive
   HIERARCHY(FromID,ToID,Level) as
      (
      select Parent, Child, '0'
      from   GRAPH where Parent is null
         union all
      select H.ToID, G.Child, H.Level + 1
      from   HIERARCHY H, GRAPH G
      where  H.ToID = G.Parent
      )
   update GRAPH
   set    Level = (select Level from HIERARCHYwhere ToID = GRAPH.Child);

When this query is executed by SQLite 3.16.2, the timer reports an execution time of 5.522 s. Adding an index on "Parent" and one on "Child" just makes things worse (6.381 s.)

I find this figures quite high, so that I try an iterative technique, which is likely to be close to the execution strategy of the WITH statement (https://www.sqlite.org/lang_with.html). I translate it for the CLI shell as follows:

   update GRAPH set Level = 0 where Parent is null;

   update GRAPH set Level = 1
   where  Parent in (select Child from GRAPH where Level = 0);

   update GRAPH set Level = 2
   where  Parent in (select Child from GRAPH where Level = 1);

   update GRAPH set Level = 3
   where  Parent in (select Child from GRAPH where Level = 2);

   update GRAPH set Level = 4
   where  Parent in (select Child from GRAPH where Level = 3);

   update GRAPH set Level = 5
   where  Parent in (select Child from GRAPH where Level = 4);

   update GRAPH set Level = 6
   where  Parent in (select Child from GRAPH where Level = 5);

For this script, I get an execution time of 0.015 s., i.e., nearly 370 times less!

Is there something wrong in my queries? Or is there an optimization trick for WITH queries by which one could approach the performance of the iterative version?

The scripts are available here: https://www.dropbox.com/s/23t4ycftlk0doy1/GRAPH-performance.zip?dl=0

Thanks for any advice

Jean-Luc Hainaut

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to