On Wed, 03 Jul 2013 11:11:29 +0200
Gabriel Corneanu <gabrielcorne...@gmail.com> wrote:

> I reply from the web and I can't easily quote.

Acknowledged, but it does make the thread more difficult to read.  :-/  

> I don't really want to argue whether it's a workaround or not. I 
> understand perfectly that's valid standard sql.
> However please accept that the given sql is quite complex; you have
> to duplicate in the join clause the ordering...

SQL never won any prizes for elegance of expression.  It's verbose and
redundant.  Yet it still manages to accomplish more with less than any
"modern" programming language you care to name.  

> explain query plan
> select count(lesser.rowid) as RANK, S.rowid, S.chan
> from params as S
> left outer join params as lesser
> on   S.chan >= lesser.chan
> group by S.rowid
> order by S.chan
> 
> This gives:
> SCAN TABLE params AS S USING INTEGER PRIMARY KEY
> SEARCH TABLE params AS lesser USING COVERING INDEX 
> sqlite_autoindex_params_1 (Chan<?)
> USE TEMP B-TREE FOR ORDER BY

Yes, and there's your O(N log N): N for the SCAN and log(N) for the
SEARCH. To process 1,000,000 rows takes 1,000,000 accesses.  To produce
the rank requires roughly 20 searches per row (given an appropriate
index), or 20,000,000 total accesses. Plus some work, depending on
memoization, to count the nodes beneath the found one.  

Inefficient? It's the theoretical *minimum* for the general case.  

In the particular case, the system has an (unexploited) opportunity to
reduce the complexity to O(N): If it "notices" that row N depends on
N-1, it can simply bump the counter.  

A system-provided RANK() function could do no better.  It might be
easier for the parser to recognize the optimization opportunity, and it
might be easier to type.  But changing the syntax doesn't affect the
semantics, and semantics drive the algorithm.  

Your original question wasn't about ranking, but simply counting the
rows "in order", which presumably wouldn't require comparing the table
to itself.  But I very much doubt that order doesn't matter  I suspect
that whenever "row number" matters, the rows are ordered by *something*
(age, date, size, quantity, name, etc.).  For them to be ordered, they
had to be sorted, and there we are again back at O(N log N).  

Did I miss something, or have I answered your efficiency concerns?  

AIUI your other concern is about ease of use, that it's easier to type
RANK(a,b,c) than to write a self-join.  You may be right; ISTR using
just such a function on other systems myself.  I would nevertheless
argue against adding one to SQLite.  SQL it redundant enough as it is.
I don't know what my policy would be for nominating a new function, but
I'm sure that "eliminates one join" sets the bar too low.   

--jkl
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to