On Wed, 27 Aug 2014 18:25:28 -0600
"Keith Medcalf" <kmedc...@dessus.com> wrote:

> >>  select id, category_id, name, min(price) as minprice
> >>    from cat_pictures
> >> group by category_id;
>
> >This peculiar behavior is very unique to SQLite. 
> 
> Not really.  Sybase, SQL Server and DB2 do (or did do) the same thing.

Did, Keith.  In 1995.  I think you're thinking of Sybase 4.8 or
thereabouts, when we kept the bit bucket next to the coal bin.  

> Well, I kind of like the former (group by) behaviour.  

I suppose the above query is equal to 

select id, a.category_id, name, minprice
from cat_pictures as a 
join (
        select category_id, min(price) as minprice
        from cat_pictures 
        group by category_id
) as b
on a.category_id = b.category_id;

I half agree with you.  I guess the SQL committee decided to make GROUP
BY 100% redundant because erroneous clauses caused subtle errors.  I
wish they'd done away with it instead.  

You are right that window functions add no new relational power to
SQL.  Everything they can express can be expressed using joins and
subqueries.  

There are, nevertheless, advantages, because implementations can often
process them faster, because the short road to the answer is more
obvious.  

Consider ideas like rank and lag.  You can rank and lag using a self
join and a min().  But the system doesn't know that all you want to do
is get "count the rows in this order" or "get this row and the one
before it", both of which are simple enough to do with a single pass
over the table (and without actually executing a join).  But every SQL
engine I know of, when it sees JOIN, executes a join.  It executes it
as fast as it can, using indexes and hash joins and whatnot, but it
still uses the same table twice to do something that physically
requires only a single pass and some bookkeeping.  

Now consider an item from the first link in Petite's helpful list:

# select x,
         row_number() over(),
         ntile(4) over w,
         lag(x, 1) over w,
         lead(x, 1) over w
    from generate_series(1, 15, 2) as t(x)
  window w as (order by x);
 x  | row_number | ntile | lag | lead 
----+------------+-------+-----+------
  1 |          1 |     1 |     |    3
  3 |          2 |     1 |   1 |    5
  5 |          3 |     2 |   3 |    7
  7 |          4 |     2 |   5 |    9
  9 |          5 |     3 |   7 |   11
 11 |          6 |     3 |   9 |   13
 13 |          7 |     4 |  11 |   15
 15 |          8 |     4 |  13 |     
(8 rows)

This kind of thing comes up all the time in timeseries analysis.  To do
it in SQLite would require joining the table to itself once for every
window function.  The engine sees the JOIN, does the JOIN in its full
generality, only to find its way along the index to the rows adjacent.

Absent novel sematic analysis of the SQL -- which I would welcome --
the short road to efficiency is to introduce windowing functions.  They
grab the parser by the lapel and shout, "Hey, Bozo, that row over
there!  Stick it with this one!"  

I'm not kidding.  Listen closely next time you're near a SQL Server.  

> But then again, maybe I'm just an old fart ...

Unlikely.  I'll let you know when my trademark application is
approved.  

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to