Re: [sqlite] Streaming group by in union of already sorted tables

Emmanouil Karvounis Fri, 23 Jan 2015 09:12:10 -0800

To explain that with an example:

tableA
(1,2) (2,3) (2,4) (3,5)


tableB
(1,2) (2,3) (4,6)


Get a pointer on tableA and one on tableB.

(1,2) and (1,2) form a group, run count(*) and output 2.

Advance both pointers.

(2,3) and (2,3) form a group, run count(*) and output 2.

Advance both pointers.

(2,4) is less than (4,6) (because 2 < 4). (2,4) forms a group, run count(*)
and output 1.

Advance pointer in tableA.

(3,5) is less than (4,6) (because 3 < 5). (3,5) forms a group, run count(*)
and output 1.

pointer on table A is depleted. Continue with tableB only.

(4,6) forms a group. run count(*) and output 1.


The sqlite plan calls for merging both tables in a B-tree and scanning it,
which incurs unneeded time and space complecity compared to the above.


Manos

On 23 January 2015 at 18:59, Emmanouil Karvounis <man...@di.uoa.gr> wrote:

> Dear Simon,
>
> Thank you for your answer and I'm sorry if I have used inappropriate
> wording and confused you.
>
> The issue is actually very simple:
>
> tableA and tableB have both primary key on (c1, c2)
>
> explain query plan
> select c1, c2, count(*) from (
> select c1, c2 from tableA
> union all
> select c1, c2 from tableB
> )
> group by c1,c2
>
> 2|0|0|SCAN TABLE tableA USING COVERING INDEX sqlite_autoindex_tableA_1
> 3|0|0|SCAN TABLE tableB USING COVERING INDEX sqlite_autoindex_tableB_1
> 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
> 0|0|0|SCAN SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>
> There is no reason to create a new temp B-tree when you can sequentially
> and in-synch scan the B-tree of tableA and of tableB and get the groups in
> one pass. Think of how you would execute the sort step of mergesort on two
> already sorted subarrays.
>
>
> Manos
>
> On 23 January 2015 at 18:42, Simon Slavin <slav...@bigfraud.org> wrote:
>
>>
>> On 23 Jan 2015, at 4:15pm, Emmanouil Karvounis <man...@di.uoa.gr> wrote:
>>
>> > In short, we have two tables that are already sorted on a combination of
>> > two fields
>>
>> There is no such thing as a 'sorted table' in SQL.  Each table is a set
>> of rows and the rows have no order.
>>
>> If you want to make it easy for SQL to access a table's rows in a
>> particular order, create an index or make that order the table's primary
>> key (which is another way of making an index).
>>
>> > select c1, c2, myagg(*) from (
>> > select * from tableA
>> > union all
>> > select * from tableB
>> > )
>> > group by c1, c2;
>>
>> This command tells SQL that you want to construct a list of every row of
>> tableA and every row of tableB.  In other words, if you have 300 rows in
>> tableA and 500 rows in tableB, you are telling SQL to construct a new table
>> of 800 rows.  And because this table doesn't yet exist, it doesn't have any
>> indexes so it can't be searched quickly.  Is that what you wanted ?
>>
>> Is there a good reason for needing this data in two separate tables
>> rather than one for which you can create an index on (c1, c2) ?
>>
>> Do the groups occur entirely within one table or do you have to add the
>> tables together before SQL can figure out the groups.
>>
>> > where tableA, tableB have primary key (c1, c2) and their schema
>> comprises 3
>> > integers: c1, c2, and prop.
>>
>> It might be worth testing with something like 'total(*)' just to make
>> sure it isn't your own function which is causing the problems.
>>
>> Simon.
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Streaming group by in union of already sorted tables

Reply via email to