On Tue, Dec 09, 2008 at 04:58:38PM +0100, jm cuaz scratched on the wall: > In a SELECT statement with multiple tables, is it possible to replace > WHERE clauses used to avoid cross joins with GROUP BY + HAVING clauses > (with the same criteria) for the same purpose (no cross join) ?
No. That's what WHERE is for. In fact, if you put all the limits in HAVING, you give the database no choice but to calculate a full cross-join on all the tables. HAVING cannot be used to limit the number of rows in a join. > Are the two methods roughly equivalent in performance ? No. For a given query, replacing WHERE with HAVING will usually result in a **significantly** longer run-time, especially if any joins are involved. It will also require a great deal more resources. > We ask this because we wonder why using the GROUP BY + HAVING solution > with 7 tables instead of WHERE clauses we see big degradation of > perforamnce (0,5 seconds against 30 minutes) + break with "disk full" > error message . The FROM and WHERE clauses are tightly bound. The query optimizer understands how to fold WHERE limits into the FROM statement, rejecting many of the rows before they're joined. HAVING is used to post-process the GROUP BY result. If all the limits are in a HAVING clause (and you have no explicit JOIN...ON limits), the system has no choice but to realize a full cross-join of all 7 tables, resulting in an absolutely enormous meta-table. For example, if the row count on your seven tables is 34, 104, 200, 1742, 45, 2, and 15, the database will have to create a temporary table with 1,663,122,240,000 rows (1.66 tera-rows). Cross-joins add up VERY quickly. This is then processed by the GROUP BY (which is going to take a bit of time to chew through all that) and then (and only then) are rows rejected by the HAVING clause. HAVING exists to put limits on the output of GROUP BY. For example, if you group sales by cities and only want to see cities with more than 100 sales, or some such thing. In general, HAVING limits should only be acting on aggregate columns that are a result of the GROUP BY. Any other limits or restrictions should go in the WHERE clause. You should almost never have a HAVING without a GROUP BY. It is completely acceptable (and appropriate) to have a WHERE, GROUP BY, and HAVING all in the same query. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users