> > > > first   second
> > > > -----   ------
> > > > Mark    Spark
> > > > Emily   Spark
> > > > Mary    Soper
> > > > Brian   Soper
> > > >
> > > > SELECT first,second FROM members ORDER BY second LIMIT 3
> 
> First, hat tip to Simon for providing a motivating example.  :-)
> 
> The question illustrates what I mean when I say Limit is not "rooted in
the
> data": in this case, "3" is not in the data, and is not a function of the
> data.  Having introduced an extraneous arbitrary element, ambituity and
> contradiction are inevitable.  It's practically the definition of a hack,
> right?  Does the job, albeit incorrectly.

Not so. First: a couple of facts to avoid misunderstanding.

1. Relational theory is a theory of set operations on tuples. Any query that
can be expressed as a set operation is valid.
2. In order to perform the familiar operations of restriction (WHERE) and
join, scalar operations are allowed on values of attributes (columns). Those
operations include:
a) compare equal (all types)
b) compare greater/less than, if the value is of any ordered type
c) expression evaluation, to construct new values of any type.

Any attribute that can be compared greater/less for the purpose of
restriction can also be used in a query that finds the largest (1 or N) or
smallest (1 or N) of that attribute. This is pure relational theory, most of
it already known to Codd back in 1972. Any disagreement so far?

So the "3" is a perfectly valid argument for a set-oriented theory: find a
subset S of N tuples with the following test for set membership: that each
member of S is greater than each member not in S when compared by certain
attributes, for N = 3. Pure set logic with a membership function.

> > I would say that this is an invalid query. As already applies for
> > DISTINCT and GROUP BY, the query parser should require that every
> > column in the column list should appear in the ORDER BY list. If it
> > does not, then the result is indeterminate.
> 
> Order By does not requre Group By, and the Select list is a *superset* of
the
> Order By list.  I'm not sure where you got the notion that the the Select
and
> Order By sets are equal.  "Order by 1" is always valid.

By analogy, not because they're the same. In order to apply LIMIT 3 the
query parser should require a test of set membership that is fully
determined for every member. It can do that by either requiring all select
list columns to appear in the ORDER BY, or by applying other constraints
such as a unique key. If it does not, then the results of the query depend
on information that is not part of the query (ie not deterministic).

> David, permit me to elaborate on my indictment of LIMIT.  You said
> earlier:
> 
> > You can't sort the relation, but you can certainly apply an order when
> > performing a query. How else would MIN() work?
> 
> I'm not disputing that.  Window functions even require multiple sorts in
the
> same query.
> 
> Whether or not "LIMIT is perfectly relational", we do know relational
algebra
> has no Sort operator, and that Order By is never part of an input to a
> relational operation (because of course relatational operands have no
order).
> Order By just produces a cursor for convenient traversal of the results.

Not so. In standard SQL ORDER BY establishes a comparison function between
tuples and is part of the DECLARE CURSOR syntax, but the cursor exists
regardless.

In a query retrieved by an external API there is no requirement for a cursor
to ever exist (it's undefined, and not required by relational theory).

> I'd be perfectly fine with a function I'll invent here and now to replace
> LIMIT:  nth().  It's a generalization of min(); the
> construction nth(C, 1) is equivalent to min(C).   You use it this way:
> 
>       SELECT first,second
>       FROM members
>       where second < nth(second, 2)
> 
> That query is based in the data.  It's unambiguous.  Given Simon's input,
it
> produces 2 rows; with "< 3" it produces 4 rows.  It can be used without
Order
> By (for the same reason min() can).  While it
> *implies* a sort, it doesn't require one (because indexes), as LIMIT does.
> And, like min() and unlike Order By, it can be used in a subquery.

The issue is find the "top N". This does not solve the problem.

> LIMIT is a hack.  It's an "obvious" extension to SQL, so simple it needn't
> even be part of it, because the program reading the rows from the DBMS can
> always stop wherever it wants.  Simple things are always implemented
freely -
> - even if unnecessary or misbegotten, simply because they're easy to do
and
> understand -- and LIMIT was
> no exception.   

I disagree. The point of LIMIT is that it is a complete query; the rows can
be returned in a single network round trip; the result set can be discarded.

Ironically, though, seemingly simple things are very
> hard, sometimes impossible, to explain mathematically.  In that way, LIMIT
> shelters under the same roof as NULL and SQL's use of bags instead of
sets.

Disagree. LIMIT is fully relational.

> While that's an abstract argument, it's at the root of very practical
> problems.  LIMIT is a FAQ on this mailing list.  Given the number of
SQLite
> programmers, we can bet every day someone uses limit in a subquery,
getting
> or concealing a nondeterministic result.  Which is to
> say: LIMIT causes harm.  Bad results come of bad math.

Disagree. The problem (if there is one) is that it is not well-defined.

Regards
David M Bennett FACS

Andl - A New Database Language - andl.org





Reply via email to