On Wed, 03 Jul 2013 11:11:29 +0200 Gabriel Corneanu <gabrielcorne...@gmail.com> wrote:
> I reply from the web and I can't easily quote. Acknowledged, but it does make the thread more difficult to read. :-/ > I don't really want to argue whether it's a workaround or not. I > understand perfectly that's valid standard sql. > However please accept that the given sql is quite complex; you have > to duplicate in the join clause the ordering... SQL never won any prizes for elegance of expression. It's verbose and redundant. Yet it still manages to accomplish more with less than any "modern" programming language you care to name. > explain query plan > select count(lesser.rowid) as RANK, S.rowid, S.chan > from params as S > left outer join params as lesser > on S.chan >= lesser.chan > group by S.rowid > order by S.chan > > This gives: > SCAN TABLE params AS S USING INTEGER PRIMARY KEY > SEARCH TABLE params AS lesser USING COVERING INDEX > sqlite_autoindex_params_1 (Chan<?) > USE TEMP B-TREE FOR ORDER BY Yes, and there's your O(N log N): N for the SCAN and log(N) for the SEARCH. To process 1,000,000 rows takes 1,000,000 accesses. To produce the rank requires roughly 20 searches per row (given an appropriate index), or 20,000,000 total accesses. Plus some work, depending on memoization, to count the nodes beneath the found one. Inefficient? It's the theoretical *minimum* for the general case. In the particular case, the system has an (unexploited) opportunity to reduce the complexity to O(N): If it "notices" that row N depends on N-1, it can simply bump the counter. A system-provided RANK() function could do no better. It might be easier for the parser to recognize the optimization opportunity, and it might be easier to type. But changing the syntax doesn't affect the semantics, and semantics drive the algorithm. Your original question wasn't about ranking, but simply counting the rows "in order", which presumably wouldn't require comparing the table to itself. But I very much doubt that order doesn't matter I suspect that whenever "row number" matters, the rows are ordered by *something* (age, date, size, quantity, name, etc.). For them to be ordered, they had to be sorted, and there we are again back at O(N log N). Did I miss something, or have I answered your efficiency concerns? AIUI your other concern is about ease of use, that it's easier to type RANK(a,b,c) than to write a self-join. You may be right; ISTR using just such a function on other systems myself. I would nevertheless argue against adding one to SQLite. SQL it redundant enough as it is. I don't know what my policy would be for nominating a new function, but I'm sure that "eliminates one join" sets the bar too low. --jkl _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users