On Tuesday, 5 March, 2019 04:09, Simon Slavin <slav...@bigfraud.org> wrote:
>On 5 Mar 2019, at 2:06am, kk <kmccord0...@gmail.com> wrote: >>>> select * from t1 >>>> where c=1 and d in (select d from t2 where c=1); >>>> select * from t1 >>>> where c=1 and d in (select d from t2 where t2.c=t1.c); >> DRH, many thanks for your reply, I was expecting same output >because I believe stmts to be equivalent, so was not sure why query >plan was different. I see the explain plans are very similar. >> But I believe original stmts mentioned are still equivalent? >How do you expect a SQL engine to approach the above statements ? >Should it process the inner SELECT first, or the outer SELECT first ? >If it processes the inner SELECT first, where does it get the value >it needs for t1.c ? >If it processes the outer SELECT first, what strategy does it use for >selecting on t1.d when it doesn't yet know whether there's going to >be no, a single, or multiple values ? >> Do you agree? And in SQLite what is best way to write such stmt (or >in other terms, what is difference)? > >Using a JOIN. > >SELECT t1.* FROM t1 > INNER JOIN t2 ON t2.c=1 AND t2.d = t1.d > WHERE t1.c=1; Technically this is invalid. t2.c == 1 is NOT a equijoin condition and should NOT appear in the ON clause (though it would be valid in the case of an outer join). However, since [INNER] JOIN is merely syntactic sugar for a , and the ON is merely syntactic sugar for a WHERE clause, this does not really matter much. Moreover, the query plans will be different because in this case you are joining T1 against T2 using only the common column d, and then filtering the interim results based on t1.c and t2.c. (The actual plan will use the c==1 condition to constrain the outer loop, then loop through the inner table based on the join column d, then filter the result of that with the "other" c==1 condition -- which table is chosen as the outer table is up to the query planner (and it will choose whichever one it things the constraint c==1 will produce the least rows)). This is entirely different from the below query where you are joining T1 and T2 on the common columns c and d, then filtering for a specific value of c in t1 (which will constrain the outer loop). The total number of candidate solutions (the number of intersects in the nested loops) can be quite different. >SELECT t1.* FROM t1 > INNER JOIN t2 ON t2.c=t1.c AND t2.d = t1.d > WHERE t1.c=1; >The INNER JOIN (as opposed to OUTER JOIN) means that a row must exist >in t2 for the equivalent row in t1 to be returned. INNER is the >default kind of JOIN. Of the two statements, it seems that the fist >one requires less processing. Actually, the latter (the properly phrased equijoin) will require the least processing since it gives the query planner the greatest latitude to generate an optimal solution. In one case (the former) you have the condition "t1.c == 1 AND t2.c == 1". The query planner cannot possibly derive from this the fact that "t1.c == t2.c". However, in the latter case, you have the condition "t1.c == t2.c AND t1.c == 1" from which the query planner can determine the fact that "t2.c == 1". Depending on the shape of the data and the available statistics this may have a great effect on the performance of the query because the query planner has more information and may choose a more optimal solution (that is, it may now choose whether t1 or t2 is the outer loop, and the descent into the inner table uses both columns). This is, of course, also fully equivalent to the properly phrased JOIN but it does constrain the solution (it is equivalent to specifying "t1 CROSS JOIN t2" in the above properly phrased equijoin in that it constrains the order of traversal of the tables): SELECT * FROM t1 WHERE c == 1 AND EXISTS (SELECT * FROM t2 WHERE c == t1.c AND d == t1.d) ; since no actual data from t2 needs to be returned. Which "phrasing" of the myriad of perhaps different queries producing (quite possibly) the same (or perhaps different) results probably depends on how one "translates" the original problem statement from English into SQL (and the adequacy of that problem statement) together with the maintainability of the application and the actual schema including the indexes ... >Internally, SQLite does comparison and conversion when faced with >different ways of phrasing your query. But that's not your problem. >Phrase what you want in as specific terms as possible, and let SQLite >pick its preferred way of solving the problem. Of course, which one chooses depends entirely on the original problem statement. If one "assumes" that the t1.c=1 and t2.c=1 in the original statement: select * from t1 where c=1 and d in (select d from t2 where c=1); is merely a lazy programmer expressing that t1.c == t2.c without actually saying so, then the properly phrased equijoin is the most efficient because it gives the query planner the most latitude to generate an optimal solution. This can only be determined by reference to the original statement of the problem that the SQL statement was designed to satisfy (since they are perhaps all the same yet different at the same time, and there is really insufficient information from which to make a decision). -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users