On Tue, Dec 09, 2008 at 04:58:38PM +0100, jm cuaz scratched on the wall:

> In a SELECT statement with multiple tables, is it possible to replace 
> WHERE clauses used to avoid cross joins with GROUP BY + HAVING clauses 
> (with the same criteria) for the same purpose (no cross join) ?

  No.  That's what WHERE is for.

  In fact, if you put all the limits in HAVING, you give the database
  no choice but to calculate a full cross-join on all the tables.
  HAVING cannot be used to limit the number of rows in a join.

> Are the two methods roughly equivalent in performance ?

  No.  For a given query, replacing WHERE with HAVING will usually
  result in a **significantly** longer run-time, especially if any
  joins are involved.  It will also require a great deal more
  resources.

> We ask this because we wonder why  using the GROUP BY + HAVING solution 
> with 7 tables instead of WHERE clauses we see big degradation of 
> perforamnce (0,5 seconds against 30 minutes)  + break with "disk full" 
> error message .

  The FROM and WHERE clauses are tightly bound.  The query optimizer
  understands how to fold WHERE limits into the FROM statement,
  rejecting many of the rows before they're joined.

  HAVING is used to post-process the GROUP BY result.  If all the
  limits are in a HAVING clause (and you have no explicit JOIN...ON
  limits), the system has no choice but to realize a full cross-join
  of all 7 tables, resulting in an absolutely enormous meta-table.

  For example, if the row count on your seven tables is 34, 104, 200,
  1742, 45, 2, and 15, the database will have to create a temporary 
  table with 1,663,122,240,000 rows (1.66 tera-rows).  Cross-joins add
  up VERY quickly.

  This is then processed by the GROUP BY (which is going to take a bit
  of time to chew through all that) and then (and only then) are rows
  rejected by the HAVING clause.


  HAVING exists to put limits on the output of GROUP BY.  For example,
  if you group sales by cities and only want to see cities with more
  than 100 sales, or some such thing.  In general, HAVING limits should
  only be acting on aggregate columns that are a result of the GROUP
  BY.  Any other limits or restrictions should go in the WHERE clause.
  You should almost never have a HAVING without a GROUP BY.

  It is completely acceptable (and appropriate) to have a WHERE,
  GROUP BY, and HAVING all in the same query.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to