On Mon, 16 May 2016 16:17:35 +1000
"dandl" <david at andl.org> wrote:

> > > All true.  But it brings up a question.  Suppose the following:
> > >
> > > first     second
> > > -----     ------
> > > Mark      Spark
> > > Emily     Spark
> > > Mary      Soper
> > > Brian     Soper
> > >
> > > SELECT first,second FROM members ORDER BY second LIMIT 3

First, hat tip to Simon for providing a motivating example.  :-)  

The question illustrates what I mean when I say Limit is not "rooted in
the data": in this case, "3" is not in the data, and is not a function
of the data.  Having introduced an extraneous arbitrary element,
ambituity and contradiction are inevitable.  It's practically the
definition of a hack, right?  Does the job, albeit incorrectly.  

> I would say that this is an invalid query. As already applies for
> DISTINCT and GROUP BY, the query parser should require that every
> column in the column list should appear in the ORDER BY list. If it
> does not, then the result is indeterminate.

Order By does not requre Group By, and the Select list is a *superset*
of the Order By list.  I'm not sure where you got the notion that the
the Select and Order By sets are equal.  "Order by 1" is always valid.  

David, permit me to elaborate on my indictment of LIMIT.  You said
earlier: 

> You can't sort the relation, but you can certainly apply an order when
> performing a query. How else would MIN() work?

I'm not disputing that.  Window functions even require multiple sorts
in the same query.  

Whether or not "LIMIT is perfectly relational", we do know relational
algebra has no Sort operator, and that Order By is never part of an
input to a relational operation (because of course relatational operands
have no order). Order By just produces a cursor for convenient traversal
of the results.  

I'd be perfectly fine with a function I'll invent here and now to
replace LIMIT:  nth().  It's a generalization of min(); the
construction nth(C, 1) is equivalent to min(C).   You use it this way: 

        SELECT first,second 
        FROM members
        where second < nth(second, 2)

That query is based in the data.  It's unambiguous.  Given Simon's
input, it produces 2 rows; with "< 3" it produces 4 rows.  It can be
used without Order By (for the same reason min() can).  While it
*implies* a sort, it doesn't require one (because indexes), as LIMIT
does. And, like min() and unlike Order By, it can be used in a
subquery.  

LIMIT is a hack.  It's an "obvious" extension to SQL, so simple it
needn't even be part of it, because the program reading the rows from
the DBMS can always stop wherever it wants.  Simple things are always
implemented freely -- even if unnecessary or misbegotten, simply
because they're easy to do and understand -- and LIMIT was
no exception.   Ironically, though, seemingly simple things are very
hard, sometimes impossible, to explain mathematically.  In that way,
LIMIT shelters under the same roof as NULL and SQL's use of bags
instead of sets.  

While that's an abstract argument, it's at the root of very practical
problems.  LIMIT is a FAQ on this mailing list.  Given the number of
SQLite programmers, we can bet every day someone uses limit in a
subquery, getting or concealing a nondeterministic result.  Which is to
say: LIMIT causes harm.  Bad results come of bad math.  

--jkl

Reply via email to