On Fri, Oct 16, 2009 at 2:08 PM, Brad Phelan <[email protected]> wrote: > On Fri, Oct 16, 2009 at 1:39 PM, Pavel Ivanov <[email protected]> wrote: >>> So if >>> x has a very large range and a small probability of a match then >>> we still have to do a full scan of 10,000 rows of A. >>> >>> Is there a better way to construct the query and or indexes so >>> the result is faster. >> >> If your x has a small selectivity in B disregarding of A, i.e. for >> each x you have pretty small amount of rows in B, then I'd suggest >> instead of your index create these two: >> >> CREATE INDEX index_B on B (x, A_id); >> CREATE INDEX index_A on A (id); >> >> And write your select in this way: >> >> select distinct * >> from A join >> (select B0.A_id as A_id >> from B B0, B B1 >> where B0.x = 10 >> and B1.x = 20 >> and B0.A_id = B1.A_id) B2 on B2.A_id = A.id >> >> > > I've tried your pattern on my production code with some interesting > results. The original pattern is below followed by your suggestion. > However in this case I have used three terms. There is a LIKE "%FOO%" > term in there which I really should replace with FTS3. > > select count(*) from epgdata > JOIN tit AS tit0 > ON tit0.epgdata_id = epgdata.id > AND ( (tit0.lang = "deu") AND ((tit0.tittext) LIKE ("%die%")) ) > JOIN tit AS tit1 > ON tit1.epgdata_id = tit0.epgdata_id > AND ( (tit1.lang = "deu") AND ((tit1.tittext) LIKE ("%der%")) ) > JOIN tit AS tit2 > ON tit2.epgdata_id = tit1.epgdata_id > AND ( (tit2.lang = "deu") AND ((tit2.tittext) LIKE ("%zu%")) ) > > ----------------------- > > select count(*) from epgdata join > ( select tit0.epgdata_id as epgdata_id > from tit as tit0, tit as tit1, tit as tit2 > where tit0.lang="deu" and tit0.tittext LIKE "%die%" > and tit1.lang="deu" and tit1.tittext LIKE "%der%" > and tit2.lang="deu" and tit2.tittext LIKE "%zu%" > and tit0.epgdata_id = tit1.epgdata_id > and tit0.epgdata_id = tit2.epgdata_id > ) as foo on foo.epgdata_id = epgdata.id > > > generates almost identical sqlite bytecode using the EXPLAIN keyword. Some > of the register numbers are different but the code structure is word for > word the same. Unfortunately I can't make head or tail of the codes. > > The query plan for both of them is > > 0|1|TABLE tit AS tit0 WITH INDEX tit__lang__tittext__epgdata_id__ > 1|0|TABLE epgdata USING PRIMARY KEY > 2|2|TABLE tit AS tit1 WITH INDEX tit__epgdata_id__ > 3|3|TABLE tit AS tit2 WITH INDEX tit__epgdata_id__ > > I have indices > > (epgdata_id) -> tit__epgdata_id__ > > and > > (lang, tittext, epgdata_id) -> tit__lang__tittext__epgdata_id__ > > It seems that SQLite maps both queries to the same internal > representation. Curious!! > > B >
Actually as I look into it I am not surprised that both queries translate to the same code. First the ON clause and the WHERE clause in SQLite are equivalent for inner joins. As well select count(*) from A, B where A.id = B.A_id is identical to select count(*) from A join B on A.id = B.A_id The question seems to be what is the best order to run the join loop in. From A to B or B to A. SQlite seems to have it's own idea on how to do this. As can be seen from my production query the loop was reordered. Perhaps I don't need to worry. All I need to make sure is that I have the correct indices available to let SQLite run the loop in whichever order it sees fit. It is all a bit black magic to me :) Brad _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

