(been trying to answer this question for a long time; a recent post on the
topic has convinced me to admit defeat and ask the list...)

I have a large database and would like to output 'results' page by page.
'Results' refers to fairly general database computations, including groups,
joins, logical subqueries (implemented with temporary tables), as well as
raw row-by-row data. The database has appropriate indices to make all of
these operations efficient, so the mammoth database size (tens of millions
of rows at the least) does not slow anything down.

Of course, the simplest strategy is just to use LIMIT on every query and
provide 'previous' and 'next' buttons on every page. Clearly, you know that
the first page needs no 'previous' button, but figuring out if there should
be a 'next' is a problem. My simple hack is just to ask for one more row
than I plan to display, and if I get it, ignore it but put in a 'next'
button. Ugly...

The concerns have already begun, since as our database grows the total
number of rows returned by these queries will grow to the millions, while
people will generally only care about the first few pages of results. Does
MySQL guarantee that it will always try to use indices to implement
ORDER...LIMIT clauses? I never want to ask my database to sort several
million records on every query; if MySQL isn't smart enough to optimize this
I need to start putting together an automated merge table architecture...

Also, the simple LIMIT clause isn't a general solution in the case of
sub-queries (using temporary tables) and GROUP clauses, because there is not
necessarily a correlation between number of rows in intermediate results and
output results. The specific queries I use have been structured so as to
compute intermediate results in an order which eliminates the unnecessary
work, but I won't always have the luxury of such easily-optimized queries.

Further, an interface which provides only 'previous' and 'next' buttons is
clearly mediocre at best; what I'd really like is to spit out page number
buttons (probably with exponential backoff: 1, 2, 3, 4, 5 ...10, 20, 30, 40,
50 ...100, 200, 300, 400, 500...), and this requires knowing the total
number of rows the queries return.

For trivial raw-data queries, this is simple; run the query twice: once with
COUNT(*) and once with the actual field list. It does worry me a bit that
this may require MySQL to actually analyze millions and millions of rows for
every page spit out; can this COUNT be computed purely from index ranges?
And I would dearly love an equivalent to num_affected_rows which told you
how many rows matched a query in spite of your LIMIT clause so that the
database didn't have to run the query twice...

For more complex queries, things get more complicated. Simply adding a
single GROUP to a query means that COUNT(*) loses its
'how-many-rows-in-result' meaning, so I've got to store the result in a
temporary table before running a count on it, and even if MySQL were smart
enough to optimize normal COUNT and ORDER...LIMIT, unless temporary tables
are *really* smart (some kind of spiffy lazy evaluation) this would
completely defeat those optimizations. Is there an efficient way to count
the number of rows in the total result of a grouped query?

When sub-selects and joins enter the picture, things get even worse, as
large and complex queries must be run against the entire database to compute
these giant temporary tables where I could previously hand-optimize the
execution sequences to operate on only some multiple (or small exponent) of
the number of records per page. Again, I can simply work to hand-optimize
independent queries to return the counts directly, but it's even less
general, still not provably efficient, and provides a substantial
opportunity for bugs relating to inconsistency between the number of results
expected and the number that could actually be found. Sigh.

I'd love for someone to provide a magic bullet that I can add to all my SQL
to just make it work, but at least in the more complex cases I don't think
efficent solutions exist. What I'm really hoping is that a few people with
intimate knowledge of the MySQL optimizer could answer the specific
questions I've posed here, as well as offer a few tutorials on exactly how
the optimizer and execution engine interact with order, limit, and group
clauses as well as temporary tables so that I can at least design my queries
with my eyes open. Maybe we can even work to add such information to the
large body of documentation on where clause optimization.

Many many thanks in advance for your help,
Rob


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to