On Fri, Oct 16, 2009 at 2:08 PM, Brad Phelan <[email protected]> wrote:
> On Fri, Oct 16, 2009 at 1:39 PM, Pavel Ivanov <[email protected]> wrote:
>>> So if
>>> x has a very large range and a small probability of a match then
>>> we still have to do a full scan of 10,000 rows of A.
>>>
>>> Is there a better way to construct the query and or indexes so
>>> the result is faster.
>>
>> If your x has a small selectivity in B disregarding of A, i.e. for
>> each x you have pretty small amount of rows in B, then I'd suggest
>> instead of your index create these two:
>>
>> CREATE INDEX index_B on B (x, A_id);
>> CREATE INDEX index_A on A (id);
>>
>> And write your select in this way:
>>
>> select distinct *
>> from A join
>> (select B0.A_id as A_id
>>  from B B0, B B1
>>  where B0.x = 10
>>  and B1.x = 20
>>  and B0.A_id = B1.A_id) B2 on B2.A_id = A.id
>>
>>
>
> I've tried your pattern on my production code with some interesting
> results. The original pattern is below followed by your suggestion.
> However in this case I have used three terms. There is a LIKE "%FOO%"
> term in there which I really should replace with FTS3.
>
> select count(*) from epgdata
>     JOIN tit AS tit0
>         ON tit0.epgdata_id = epgdata.id
>         AND ( (tit0.lang = "deu") AND ((tit0.tittext) LIKE ("%die%")) )
>     JOIN tit AS tit1
>         ON tit1.epgdata_id = tit0.epgdata_id
>         AND ( (tit1.lang = "deu") AND ((tit1.tittext) LIKE ("%der%")) )
>     JOIN tit AS tit2
>         ON tit2.epgdata_id = tit1.epgdata_id
>         AND ( (tit2.lang = "deu") AND ((tit2.tittext) LIKE ("%zu%")) )
>
> -----------------------
>
> select count(*) from epgdata join
>    ( select tit0.epgdata_id as epgdata_id
>      from tit as tit0, tit as tit1, tit as tit2
>      where tit0.lang="deu" and tit0.tittext LIKE "%die%"
>        and tit1.lang="deu" and tit1.tittext LIKE "%der%"
>        and tit2.lang="deu" and tit2.tittext LIKE "%zu%"
>        and tit0.epgdata_id = tit1.epgdata_id
>        and tit0.epgdata_id = tit2.epgdata_id
>    ) as foo on foo.epgdata_id = epgdata.id
>
>
> generates almost identical sqlite bytecode using the EXPLAIN keyword. Some
> of the register numbers are different but the code structure is word for
> word the same. Unfortunately I can't make head or tail of the codes.
>
> The query plan for both of them is
>
>    0|1|TABLE tit AS tit0 WITH INDEX tit__lang__tittext__epgdata_id__
>    1|0|TABLE epgdata USING PRIMARY KEY
>    2|2|TABLE tit AS tit1 WITH INDEX tit__epgdata_id__
>    3|3|TABLE tit AS tit2 WITH INDEX tit__epgdata_id__
>
> I have indices
>
>    (epgdata_id) -> tit__epgdata_id__
>
> and
>
>    (lang, tittext, epgdata_id) -> tit__lang__tittext__epgdata_id__
>
> It seems that SQLite maps both queries to the same internal
> representation. Curious!!
>
> B
>

Actually as I look into it I am not surprised that both queries
translate to the same code. First the ON clause and the WHERE clause
in SQLite are equivalent for inner joins. As well

select count(*) from A, B
where A.id = B.A_id

is identical to

select count(*) from A
join B on A.id = B.A_id

The question seems to be what is the best order to run the join loop
in. From A to B or B to A. SQlite seems to have it's own idea on how
to do this. As can be seen from my production query the loop was
reordered. Perhaps I don't need to worry. All I need to make sure is
that I have the correct indices available to let SQLite run the loop
in whichever order it sees fit. It is all a bit black magic to me :)

Brad
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to