Re: [sqlite] Is there a way to inner join on named intervals?

Pavel Ivanov Fri, 13 Aug 2010 11:00:27 -0700

> My next question is what index I should create on table
> A and B to speed up such an inner join? Are indexes on each of the
> first three column of each table enough? What is the best comparison
> (along with the appropriate indexes) in term of the performance?


"Indexes on each field" never help. SQLite uses one index per table
per query only.
For your query you should have index on name field in bigger table
(either A or B). If you write query as I did (A.left < B.right and
A.right > B.left) then depending on data distribution either (name,
left) or (name, right) index could help better than just (name). But
cases when those indices would help are very specific, so in general
your only option is index on name only.


Pavel

On Fri, Aug 13, 2010 at 12:48 PM, Peng Yu <pengyu...@gmail.com> wrote:
> On Fri, Aug 13, 2010 at 11:32 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
>> I don't understand where do you see a problem but it looks like this
>> join will do what you want:
>>
>> select * from A, B
>> where A.name = B.name
>> and A.left < B.right
>> and A.right > B.left
>>
>>> I could use an external program (such as python
>>> sqlite package) to enumerate all the named interval from table A and
>>> search for overlapping named intervals in table B, but this operation
>>> has a complexity of M log (N), where M is the length of table A and N
>>> is the length of table B. If some sort of "inner join" could be used,
>>> the complexity should be reduced to log(M+N).
>>
>> How did you come to this conclusion? Any inner join will execute with
>> complexity either M log(N) or N log(M) anyway. The only benefit is
>> that SQLite can decide which way to join depending on relative size of
>> tables A and B. And constant in operation complexity is a bit smaller
>> with C code that in python code...
>
> Thank all the people that replied my email. I don't really understand
> how inner join work. Sorry for the confusion.
>
> The comparison that I actually need is the following one (logically
> correct, but I'm not sure if it is the fastest comparison in term of
> performance). My next question is what index I should create on table
> A and B to speed up such an inner join? Are indexes on each of the
> first three column of each table enough? What is the best comparison
> (along with the appropriate indexes) in term of the performance?
>
> create table A (name text, left integer, right integer, tag text);
> create table B (name text, left integer, right integer, attr text);
> insert into A values('a', 1, 10, 'tag1');
> insert into A values('a', 5, 15, 'tag2');
> insert into A values('a', 21, 30, 'tag3');
> insert into A values('b', 3, 12, 'tag4');
> insert into A values('b', 15, 25, 'tag5');
> insert into A values('b', 19, 30, 'tag6');
>
> insert into B values('a', 3, 7, 'attr1');
> insert into B values('a', 8, 12, 'attr2');
> insert into B values('a', 16, 18, 'attr3');
> insert into B values('a', 25, 35, 'attr4');
> insert into B values('b', 31, 32, 'attr5');
>
> select * from A inner join B on A.name=B.name AND max(A.left, B.left)
> < min(A.right, B.right);
>
> name        left        right       tag         name        left
>  right       attr
> ----------  ----------  ----------  ----------  ----------  ----------
>  ----------  ----------
> a           1           10          tag1        a           3
>  7           attr1
> a           1           10          tag1        a           8
>  12          attr2
> a           5           15          tag2        a           3
>  7           attr1
> a           5           15          tag2        a           8
>  12          attr2
> a           21          30          tag3        a           25
>  35          attr4
>
>
>> On Fri, Aug 13, 2010 at 11:07 AM, Peng Yu <pengyu...@gmail.com> wrote:
>>> Hi,
>>>
>>> Suppose that I have a table "A", each row represents a interval. For
>>> example, the first row represents an interval [1,10) with a name "a".
>>> The first and second rows are considered overlapping because the
>>> interval [1,10) and interval [5,15) intersect and both rows have the
>>> same name "a".
>>>
>>> name left right   tag
>>> -------------------------------------
>>> a          1     10   tag1
>>> a          5     15   tag2
>>> a        21     30   tag3
>>> b          3     12   tag4
>>> b        15     25   tag5
>>> b        19     30   tag6
>>>
>>> I want to "inner join" the above table and the following table "B"
>>> based on the named interval overlapping.
>>>
>>> name left right   attr
>>> -------------------------------------
>>> a          3       7   attr1
>>> a          8     12   attr2
>>> a        16     18   attr3
>>> a        25     35   attr4
>>> b        31     32   attr5
>>>
>>> The result is the following. In each row, the named interval from A
>>> overlaps the named interval from B. I don't see there is an easy way
>>> to do this in sqlite3. I could use an external program (such as python
>>> sqlite package) to enumerate all the named interval from table A and
>>> search for overlapping named intervals in table B, but this operation
>>> has a complexity of M log (N), where M is the length of table A and N
>>> is the length of table B. If some sort of "inner join" could be used,
>>> the complexity should be reduced to log(M+N). I'm wondering if there
>>> something that can help do this kind of named interval inner join
>>> easily.
>>>
>>> A.name A.left A.right A.tag B.name B.left B.right B.attr
>>> ------------------------------------------------------------------------
>>> a                  1     10     tag1        a          3       7    attr1
>>> a                  1     10     tag1        a          8     12    attr2
>>> a                  5     15     tag2        a          3       7    attr1
>>> a                  5     15     tag2        a          8     12    attr2
>>> a                21     30     tag3        a        16     18    attr3
>>>
>>> --
>>> Regards,
>>> Peng
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> Regards,
> Peng
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Is there a way to inner join on named intervals?

Reply via email to