On Tue, Jun 07, 2011 at 05:38:09PM +0200, Sidney Cadot scratched on the wall: > > Please someone show me what is the correct value of this avg() in practice.
> SELECT AVG(data) FROM TryAvg ORDER BY ABS(data) DESC; > > But apparently, SQLite ignores the ORDER BY clause because of the > AVG() function, perhaps -erroneously- assuming AVG() is commutative. No, this is basic SQL order of operations. The ORDER BY is applied *after* the aggregation. That SELECT returns one row, so the ORDER BY does nothing. Actually... the ORDER BY doesn't even make sense. You're asking it to ORDER the AVG() result by the last "data" value in the GROUP, only you have no idea which actual "data" value that is. Most DBs consider this statement to be an error. > This works, but isn't guaranteed to work: > > SELECT AVG(data) FROM (SELEFT data FROM TryAvg ORDER BY ABS(data) DESC); No, it isn't guaranteed to work, because ORDER BY only makes sense for output. Row order within SELECTs are undefined, and that remains true of a sub-SELECT, even if it contains an ORDER BY. > Funnily enough, in Postgres, the test seems to expose an actual bug, > or at least a strange error message: > > sidney=# select AVG(data) from TryAvg ORDER BY data DESC; > ERROR: column "tryavg.data" must appear in the GROUP BY clause or be > used in an aggregate function > LINE 1: select AVG(data) from TryAvg ORDER BY data DESC; Not a bug, just as the message says... You can't ORDER BY something you didn't GROUP BY, or wasn't the output of an aggregate. Using an aggregate without a GROUP BY implies a group across the whole selection. Unlike SQLite, PostgreSQL will not let you attempt to ORDER BY undefined data. Just remember the AVG() is computed, grouped rows are collapsed, *then* the ORDER BY is applied. I like this aspect of SQLite, but many people consider it questionable. Personally I'd rather have the freedom to shoot myself in the foot. > Now, to address your actual point. The fact that we cannot control > evaluation order in SQL does mean that one has to be wary in case > one's data is badly conditioned. The problem is in SQL: it assumes > commutativity for aggregate functions, and that is a property that no > floating point format conceived can deliver. Which really an issue with floating-point. Addition *is* commutative, as any math teacher can tell you. The fact that it isn't (nor are a lot of other basic arithmetic principals) when using a floating-point representation is an issue with the representation. This very fact is one of the main reasons traditional RDBMS systems have always kept away from using FP for real numbers-- especially in financial operations, where every penny counts, no matter how large the sum. > This is an interesting > fact (as is the fact that, as a consequence of this, identical > invocations of aggregate functions can yield different results under > seemingly identical circumstances in SQL), but I disagree that this > disqualifies using SQL for use in a scientific setting. There are > preciously little types of measurements where one has to calculate > with 16 orders of magnitude as your example does, and in such > circumstances, the scientist needs to be exceedingly on his or her > guard to know what she is doing in ANY language, with ANY storage > mechanism. I think therefore that this example, interesting as it is, > does not provide a conclusive argument in the discussion at hand. Which sounds more or less like "using the right tool for the job." If you're aware enough to understand these kinds of issues** and if you truly need IEEE 754 down to the bit, you need to be writing to the raw hardware and doing all your data manipulation yourself. If you're doing common sense operations with values in a similar number domain, SQL, SQLite, Perl, Python, or whatever-- despite not presenting a 100% IEEE 754 environment-- will generally do something "close enough" to the right thing that nobody cares. Even if you do care, careful ordering of FP operations isn't so much about getting the "right" answer (in most cases) as it is about tightening the "close enough" until it falls back into the "nobody cares" range. ** In my own experiences working at NCSA, two national labs, and several university HPC projects, most people doing "scientific computing" don't have a damn clue. Most of the "scientific computing" code I saw was written by domain grad students or interns that would have a hard time writing Hello World without their notes. If it is a physics program, chances are still better than not it is written in FORTRAN. Error bars? What are those? -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users