To explain that with an example: tableA (1,2) (2,3) (2,4) (3,5)
tableB (1,2) (2,3) (4,6) Get a pointer on tableA and one on tableB. (1,2) and (1,2) form a group, run count(*) and output 2. Advance both pointers. (2,3) and (2,3) form a group, run count(*) and output 2. Advance both pointers. (2,4) is less than (4,6) (because 2 < 4). (2,4) forms a group, run count(*) and output 1. Advance pointer in tableA. (3,5) is less than (4,6) (because 3 < 5). (3,5) forms a group, run count(*) and output 1. pointer on table A is depleted. Continue with tableB only. (4,6) forms a group. run count(*) and output 1. The sqlite plan calls for merging both tables in a B-tree and scanning it, which incurs unneeded time and space complecity compared to the above. Manos On 23 January 2015 at 18:59, Emmanouil Karvounis <man...@di.uoa.gr> wrote: > Dear Simon, > > Thank you for your answer and I'm sorry if I have used inappropriate > wording and confused you. > > The issue is actually very simple: > > tableA and tableB have both primary key on (c1, c2) > > explain query plan > select c1, c2, count(*) from ( > select c1, c2 from tableA > union all > select c1, c2 from tableB > ) > group by c1,c2 > > 2|0|0|SCAN TABLE tableA USING COVERING INDEX sqlite_autoindex_tableA_1 > 3|0|0|SCAN TABLE tableB USING COVERING INDEX sqlite_autoindex_tableB_1 > 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL) > 0|0|0|SCAN SUBQUERY 1 > 0|0|0|USE TEMP B-TREE FOR GROUP BY > > There is no reason to create a new temp B-tree when you can sequentially > and in-synch scan the B-tree of tableA and of tableB and get the groups in > one pass. Think of how you would execute the sort step of mergesort on two > already sorted subarrays. > > > Manos > > On 23 January 2015 at 18:42, Simon Slavin <slav...@bigfraud.org> wrote: > >> >> On 23 Jan 2015, at 4:15pm, Emmanouil Karvounis <man...@di.uoa.gr> wrote: >> >> > In short, we have two tables that are already sorted on a combination of >> > two fields >> >> There is no such thing as a 'sorted table' in SQL. Each table is a set >> of rows and the rows have no order. >> >> If you want to make it easy for SQL to access a table's rows in a >> particular order, create an index or make that order the table's primary >> key (which is another way of making an index). >> >> > select c1, c2, myagg(*) from ( >> > select * from tableA >> > union all >> > select * from tableB >> > ) >> > group by c1, c2; >> >> This command tells SQL that you want to construct a list of every row of >> tableA and every row of tableB. In other words, if you have 300 rows in >> tableA and 500 rows in tableB, you are telling SQL to construct a new table >> of 800 rows. And because this table doesn't yet exist, it doesn't have any >> indexes so it can't be searched quickly. Is that what you wanted ? >> >> Is there a good reason for needing this data in two separate tables >> rather than one for which you can create an index on (c1, c2) ? >> >> Do the groups occur entirely within one table or do you have to add the >> tables together before SQL can figure out the groups. >> >> > where tableA, tableB have primary key (c1, c2) and their schema >> comprises 3 >> > integers: c1, c2, and prop. >> >> It might be worth testing with something like 'total(*)' just to make >> sure it isn't your own function which is causing the problems. >> >> Simon. >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users