On Wed, Mar 07, 2012 at 10:30:54PM +0000, Emmanuel MacCaull scratched on the
wall:
> Hi,
>
> I'm writing a custom aggregate function whose result depends on the sort
> order of a subselect. It isn't clear whether this a good idea
Not a good idea.
> and the documentation on group_concat(X) seems to suggest that the
> order that rows are processed by an aggregate function is not
> guaranteed ("The order of the concatenated elements is arbitrary."
> -- http://www.sqlite.org/lang_aggfunc.html).
That's true.
> Given something like this:
>
> SELECT group_concat(name)
> FROM (SELECT * FROM table ORDER BY name)
> GROUP BY some_id;
>
> I understood the documentation of group_concat() as meaning that even
> though there is an ORDER BY in the subselect, the order of the
> concatenated names is not guaranteed.
>
> If that is the case, it seems logical that the restriction is only there
> because the order in which the aggregate function processes each row is
> arbitrary. Does anyone know if this is true or is it safe to assume that
> the aggregate will access each row in order?
Not so much "arbitrary" as undefined. If you run the same query on
the same data using the same version of SQLite, I'm sure you'll get
the same results. That said, you shouldn't depend on that.
SQL tables themselves have no ordering... they are properly considered
sets of rows (in the formal mathematical sense of "set"). Similarly,
most working sets of data (relational variables) have no defined
ordering. A *result* set may have an order, defined by an ORDER BY,
but that order may not hold when the result of a sub-select is cast
back into a data source (such as above).
In short, the database engine is able to reorder anything it wants,
at any time, except for the final ORDER BY. While a specific query
may do what you expect (at least for this version of SQLite) things
may change. Heck, just adding an index can alter the query plan and
change orderings. SQLite even has a pragma to purposely re-order
queries just to test for unintentional order dependencies.
For example, most (but not all; and not all the time) database engines
will implement a GROUP BY clause by sorting the rows according to the
GROUP BY expressions. This puts all "like" rows next to each other,
making it easy to collapse the groups into individual rows. There
is no reason to assume that sorting process is stable, or that it
will preserve the sub-select ordering in any way. In fact, I
wouldn't be surprised to find out some query engines just ignores a
"data source" sub-select's ORDER BY all together, since it doesn't
make semantic sense. There are all kinds of additional optimizations
that can open up.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users