On Mon, 16 May 2016 16:17:35 +1000 "dandl" <david at andl.org> wrote:
> > > All true. But it brings up a question. Suppose the following: > > > > > > first second > > > ----- ------ > > > Mark Spark > > > Emily Spark > > > Mary Soper > > > Brian Soper > > > > > > SELECT first,second FROM members ORDER BY second LIMIT 3 First, hat tip to Simon for providing a motivating example. :-) The question illustrates what I mean when I say Limit is not "rooted in the data": in this case, "3" is not in the data, and is not a function of the data. Having introduced an extraneous arbitrary element, ambituity and contradiction are inevitable. It's practically the definition of a hack, right? Does the job, albeit incorrectly. > I would say that this is an invalid query. As already applies for > DISTINCT and GROUP BY, the query parser should require that every > column in the column list should appear in the ORDER BY list. If it > does not, then the result is indeterminate. Order By does not requre Group By, and the Select list is a *superset* of the Order By list. I'm not sure where you got the notion that the the Select and Order By sets are equal. "Order by 1" is always valid. David, permit me to elaborate on my indictment of LIMIT. You said earlier: > You can't sort the relation, but you can certainly apply an order when > performing a query. How else would MIN() work? I'm not disputing that. Window functions even require multiple sorts in the same query. Whether or not "LIMIT is perfectly relational", we do know relational algebra has no Sort operator, and that Order By is never part of an input to a relational operation (because of course relatational operands have no order). Order By just produces a cursor for convenient traversal of the results. I'd be perfectly fine with a function I'll invent here and now to replace LIMIT: nth(). It's a generalization of min(); the construction nth(C, 1) is equivalent to min(C). You use it this way: SELECT first,second FROM members where second < nth(second, 2) That query is based in the data. It's unambiguous. Given Simon's input, it produces 2 rows; with "< 3" it produces 4 rows. It can be used without Order By (for the same reason min() can). While it *implies* a sort, it doesn't require one (because indexes), as LIMIT does. And, like min() and unlike Order By, it can be used in a subquery. LIMIT is a hack. It's an "obvious" extension to SQL, so simple it needn't even be part of it, because the program reading the rows from the DBMS can always stop wherever it wants. Simple things are always implemented freely -- even if unnecessary or misbegotten, simply because they're easy to do and understand -- and LIMIT was no exception. Ironically, though, seemingly simple things are very hard, sometimes impossible, to explain mathematically. In that way, LIMIT shelters under the same roof as NULL and SQL's use of bags instead of sets. While that's an abstract argument, it's at the root of very practical problems. LIMIT is a FAQ on this mailing list. Given the number of SQLite programmers, we can bet every day someone uses limit in a subquery, getting or concealing a nondeterministic result. Which is to say: LIMIT causes harm. Bad results come of bad math. --jkl