On Tue, 17 May 2016 11:09:53 +1000
"dandl" <david at andl.org> wrote:

> Any disagreement so far?

Full agreement; your description is perfectly sound.  

I am quite certain nevertheless that LIMIT has no relational basis.
Nothing based on Order By could.  And I'll try to clear up what I meant
by a cursor.  

> So the "3" is a perfectly valid argument for a set-oriented theory:
> find a subset S of N tuples with the following test for set
> membership: that each member of S is greater than each member not in
> S when compared by certain attributes, for N = 3. Pure set logic with
> a membership function.

You lost me at "subset S of N tuples".  Which relational operator takes
N as an argument?  

You could be right vis a vis set theory.  But strictly within
relational theory, I'll cede your point when you demonstrate it
relationally.  The N in "N tuples" is not to be found in the relation's
extension. Even if we include aggregation, all relational functions
operate on the *values* of the set, or functions of the values of the
set (e.g. min() or avg()).  N is not among them.  

> > "Order by 1" is always valid.
> 
> By analogy, not because they're the same. In order to apply LIMIT 3
> the query parser should require a test of set membership that is fully
> determined for every member. It can do that by either requiring all
> select list columns to appear in the ORDER BY, or by applying other
> constraints such as a unique key. 

Unless your point is constrained to the LIMIT operator, you're making a
mistake here.  Normally the Select list is a superset of the Order By
list.  If they need to be equal for LIMIT to work, that's just one more
nail in LIMIT's coffin, a byproduct of ts nonrelationality.  

Given a table T {A,B}, what's wrong with "select A from T
order by B" (even though that's not valid SQL)? The system has access
to the full table, can sort by B and project A. Nothing to do with
indexes or constraints.  The query is invalid not because it couldn't
be executed, but because it's undefined: there's no "sort" operator to
apply before "project".  Order By, not being a relational function, is
a filter applied to the relational product.  

        ( Yes, "filter"; filters don't alway remove: 
                $ echo hello | rev
                olleh
        )

All members of the Order By list must appear in the Select list because
the Select list is input to Order By, see next.  

> > > Order By just produces a cursor for convenient traversal of the
> > > results.
> 
> Not so. In standard SQL ORDER BY establishes a comparison function
> between tuples and is part of the DECLARE CURSOR syntax, but the
> cursor exists regardless.

Regarding Order By and cursors, I'm referencing CJ Date.  The reason
Order By cannot appear in a subquery is that its output is not a table,
but a cursor.  Whether Order By "establishes a comparison function
between tuples" is irrelevant; so too does Where.  It's the product
that's different.  

If you accept that Order By can appear only in the outermost query,
it's clear that it has no relational role at all.  It affects only the
order in which the rows are returned to the caller.  Other than syntax,
how is that different from a cursor?  

> The point of LIMIT is that it is a complete query; the rows can
> be returned in a single network round trip; the result set can be
> discarded.

So it's just a tiny communication optimization?  After all, compared to
a network round trip, sorting the result (in order to apply LIMIT
rationally) is usually far more expensive.  I bet no study has ever
shown LIMIT to improve performance measurably, not that that would
justify its existence.  

> > LIMIT causes harm.  Bad results come of bad math.
> 
> Disagree. The problem (if there is one) is that it is not
> well-defined.

Then I think you mean you agree!  Because LIMIT is nonrelational, it's
*undefined*.  We have a long thread here that might be titled "what
should LIMIT do?"  There's no debate about the relational operators.
It's only the ad hoc add-ons that present problems, precisely because
they lie outside the theory and provide (unwittingly) idiosyncratic
behavior.  Like NULL, LIMIT is unspecified by the theory.  Like NULL,
LIMIT is hard to get "right" because each implementation has to make
its own decision about what it means, instead of relying on the math.  

--jkl

Reply via email to