Re: [sqlite] Handling of IEEE-754 nan and +/-inf in SQLite?

Jean-Christophe Deschamps Tue, 07 Jun 2011 02:40:44 -0700

>Sure, if you're just computing average() then you'll not get any NaNs.


NaNs, NO (if we don't have NaNs in the set) but issues, YES.  It all 
depends.  No one knows.  You don't even know what you're computing exactly.

SQL interpret avg() as "take this data as a set, sum up these numerical 
values and divide by set size".  You can give it a formal mathematical 
unambiguous meaning in the SQL idealized context.

Now, please try this:

create table if not exists TryAvg (data double);
delete from TryAvg;
insert into TryAvg values (1.23456E18);
insert into TryAvg values (60);
insert into TryAvg values (50);
insert into TryAvg values (40);
insert into TryAvg values (-1.23456E18);
select avg(data) from TryAvg;
0

delete from TryAvg;
insert into TryAvg values (60);
insert into TryAvg values (50);
insert into TryAvg values (40);
insert into TryAvg values (1.23456E18);
insert into TryAvg values (-1.23456E18);
select avg(data) from TryAvg;
51.2

delete from TryAvg;
insert into TryAvg values (60);
insert into TryAvg values (50);
insert into TryAvg values (1.23456E18);
insert into TryAvg values (-1.23456E18);
insert into TryAvg values (40);
select avg(data) from TryAvg;
8

delete from TryAvg;
insert into TryAvg values (1.23456E18);
insert into TryAvg values (-1.23456E18);
insert into TryAvg values (60);
insert into TryAvg values (50);
insert into TryAvg values (40);
select avg(data) from TryAvg;
30

Please someone show me what is the correct value of this avg() in 
practice.  One can consider avg() being a randomized multi-valued 
function, depending on the dataset and the order in which a given run 
will process it, which might vary between runs.

I agree this example is just made up but you will recognize that I 
didn't push it far by using exceedingly small/large values 
either.  Magnitudes like these occur in real-world scientific data 
everyday.  That or you don't need FP altogether and can switch to integers.

We are already far from the unambiguous definition above.

The general problem is that, when you perform a calculation that you 
suspect (and you must have a good idea of that _before_ ignition) has 
some chance of getting FP-wise unstable just like the above very simple 
example, then _you_ as an engineer need to conduct that computation 
with great care if you want the result to be meaningful at all and 
closest to the idealized mathematical definition.

Given that SQLite (just like other engines) doesn't allow you to force 
any ordering in the data being fed to aggregate functions, I hardly see 
how you can blindly rely on such computation to manipulate/produce 
scientific data.  The problem with FP is that if you need meaningful 
results, you have to care about such details at every stage of the 
computation.

>  But you might be dividing averages, or whatever.  It's not just the
>aggregate functions, but what you do with their results.

Correct.  And in this context, there are steps in a given computation 
where NaNs are to be simply discarded/ignored and others where they 
absolutely need signaling (that they are quiet or signaling NaNs in the 
IEEE sense is something else).
If some operation is going to result in a NaN, then something has gone 
wrong already and either needs fixing or ignoring the entry.

Major scientific software products --actually used for scientific 
computations whose result impact our lives-- treat NaNs somehow like 
NULLs (e.g. consider the NaN data doesn't exist in a 
summation).  Octave or MatLab are easy examples.  You can't pretend 
they are toy tools for kids.

Also if you want both NaNs support and decent '754 compliance you also 
have to support +0 and -0 as two distinct values, and five rounding 
methods.  How are you going to stuff this in SQL?  And what about 
portability over the giant range of applications using SQLite?

>precision may not be important, yet the ability to handle infinities
>and NaNs might be.

That itself would require far more control at the SQL level than 
possible to implement, as seen from the examples given with round() and 
avg() which are completely basic or simple computations.

>(That said, I think it's clear that there's some
>demand for SQLite3 to be extensible such that extended precision FP
>libraries could be transparently integrated.

That seems way more useful than the dev team spending months/years 
bringing a huge FP library and supporting it on an unbounded range of 
hardware/software platforms.

>And whether SQLite3's existing, limited FP functionality is sufficient 
>or not for any particular app is yet another story.)

In fact this is the whole point.  It works fairly well for what it is 
(Lite), but doesn't hold water for processing FP data susceptible of 
flirting too closely with the limits of flight enveloppe.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Handling of IEEE-754 nan and +/-inf in SQLite?

Reply via email to