>To reiterate, Keith: to get the query to execute properly, I didn't >change >the LEFT JOIN to an INNER JOIN! >Nope, >I rewrote > >SELECT >FROM > a > LEFT JOIN b ON <key> >WHERE b.c=5
This means: Take all rows of table a match with rows of table b on the key and where there is no match "fill" columns that would have come from b with a NULL AND THEN AFTER THAT return all rows where b.c = 5 >to >SELECT >FROM > a > LEFT JOIN b ON <key> AND b.c=5 >WHERE b.c IS NOT NULL This translates to: Take all rows of table a match with rows of table b on the key AND where b.c = 5 and where there is no match "fill" columns that would have come from b with a NULL AND THEN AFTER THAT return only rows where b.c is not NULL The primary difference is likely the test on b.c. You see, to check if b.c is a particular value requires unpacking the row and retrieving the value. Checking whether b.c IS NULL does not require unpacking the data but merely reading the 'datatype', which should be a much faster operation. If you wrote the query as an inner join it would be much faster since it would not be possible after projection for b.c to be anything other than not null. >So I just added a redundant predicate and it runs perfectly, on >SQLite! >That's why I said this simple improvement can surely be taken care of >on the >optimizer, while the larger discussion of actually changing the outer >join >to an inner join or even tackling the -OR- case is for sure something >nice >to think of, but increasingly more complicated. I think the query is ill-conceived when written as an outer join. You are asking the query to "project" a result containing a crap-load of rows which you will subsequently remove from the result set. For bitty-bases (a couple of hundred rows) this is irrelevant. However, for tables containing millions of rows, creating a googillion of rows then culling out all of them except 4 is a very bad use of computer resources. Why generate them in the first place if you are just going to discard them? I have seen many a mis-designed program which works blazingly quickly on a "test" database that contains a dozen customers, half-a-dozen products, with two or three components each. But when populated with the production data of 1,000,000 customers, 500,000 products, and between 100 and 10,000 components per product frankly take "years" to run any step and are completely unuseable. One should always design as if each table contained a googillion rows. If it is fast enough then, it will certainly be efficient enough to handle your ten row per table database. _______________________________________________ sqlite-users mailing list [email protected] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

