On Fri, 20 May 2016 14:17:25 +1000
"dandl" <david at andl.org> wrote:

> Every aggregation function is at least second order: a function that
> applies a function to the set. So for MIN the function is 'less
> than', for SUM() the function is 'plus' and so on. In Andl
> aggregation functions are provided by fold(), which takes a function
> as an argument.

I want you to know that you hijacked my Saturday.  I was bothered about
what "first order" and "second order" mean, suspecting that we meant
different things.  After an afternoon with the Oracle of All Knowledge,
I think we were talking about different things, and you had a better
handle on your end than I did on mine.  

I was concerned that we were treading in territory outside first-order
predicate logic.  On review, as Wikipedia explains, HOL deals in
another beast, namely the quantification of sets of sets.  

You were talking about something much simpler, second-order *functions*.
The input is still a value -- an individual member of a set -- plus
another function.  As you say, there are many such in SQL.  In keeping
with the language's purpose, the primitive components are not exposed,
so it's not possible to reconstruct min as FOLD(MIN,X). We can do
similar things with subqueries, e.g.

        select sum(N) from (select count(*) as N from T group by a) as A

One can imagine that restated as 

        select F(sum, count, t) from T

where F is defined as taking two functions and a value.  I guess that
would make F a third-order function. 

APL is instructive in this regard.  What we usually call
operators --  + - x ?  -- are termed *functions* in APL, in keeping
with their mathematical definition.  A function that takes a function
is called an operator.  One such is "/", the reduction operator; SUM(t)
could be expressed as 

        +/t

> > 2.  Limit, as currently implemented, lies outside the theory
> > because it doesn't operate on sets.
> 
> I'll put that one on hold pending a suitable reference or detailed
> mathematical treatment.

I think I can accept "first(N)" could be a set function, and if SQL
dealt in sets, LIMIT would be a deterministic function.  But SQL deals
in bags, and with a couple of odd exceptions -- random(), now() -- all
its functions are determistic.  LIMIT is not a deterministic
function.  I'm not sure what happens to first order predicate logic in
the face of nondeterminism, but I'm sure it's not good.  

> Sorry. Your nth() is a second order function 

OK. 

> The (single-pass) implementation would maintain a temporary table of
> rows that are 'minimum so far seen', to a maximum of N. It would be an
> implementers decision what to do with a row equal to one in that
> table once N has been reached: add it or ignore it?

nth() acts on a single column; it keeps the set of N smallest values, as
you say.  The answer to your question is "ignore it" because a value
equal to one in the set is already a member.  Given the input


        C {1, 1, 2, 2, 2, 3}

        min(C) = 1
        nth(C, 1) = {1}
        nth(C, 2) = {1, 2}

I'm not claiming any deep insight, only that nth() would be handy and
can be defined mathematically (even if I can't do it).  

--jkl

Reply via email to