Re: [sqlite] Handling of IEEE-754 nan and +/-inf in SQLite?

Jay A. Kreibich Mon, 06 Jun 2011 21:01:11 -0700

On Mon, Jun 06, 2011 at 11:27:26PM +0200, Sidney Cadot scratched on the wall:
> Hi Jay,
> 
> > One should never assume a database uses IEEE 754, so one should never
> > assume it uses similar semantics.
> 
> One should not assume it unless it is documented, of course. Postgres,
> for example, half-heartedly embraces IEEE-754 'on platforms that use
> it' (see section 8.1.3 of its manual). It documents the fact that +/-
> infinity and NaN are useable on such systems.


  Well, half-heatedly is about what I would expect, and defines nearly
  every application out there.  IEEE 754 isn't a bad model, it just
  isn't a complete model for most environments.  Adaptations need to
  be made.

> > Even those databases that do use
> > IEEE 754 for a select few of their types have other considerations.
> > In the bigger picture, IEEE 754 makes up, at most, a small part of
> > the SQL numeric environment.
> 
> For SQL: yes. For SQLite though, it is the only option.

  No, it isn't.  SQLite's "numeric environment" consist of more than
  just floating point numbers, and most DBs even have other real-number
  representations.  For that matter, the SQL "numeric environment" goes
  beyond even numbers, since SQL very much has the concept of things like 
  "NULL + 2", and SQL standard needs to be a defined way of dealing
  with such things.

  As it is, SQLite is also somewhat casual about translation between
  floats and integers.  For example, some floating point whole numbers
  are translated into integers before they're stored.  The two types of
  numbers are highly integrated.  You can call that wrong if you want,
  but at the end of the day the math works out the way most people
  expect.  I would rather have an environment that is constant
  across the board then one that's highly specialized, but only for
  specific numeric types, and who knows what happens when they mix.

  Beyond that, SQLite is dependent on the underlying architecture, just
  as Postgres is.  Given the number of low-power embedded processors
  that SQLite runs on, I fully expect there are thousands, if not
  millions, of copies of SQLite running on hardware that is not fully
  IEEE 754 compliant.  In storage, perhaps, but not in calculations.
  Even if we all agreed that making SQLite a perfect 754 environment
  is a Good Idea, it couldn't be done.

> > Using 754 as a reference for the rest of the environment strikes me as
> > poorly thought out and putting the tail before the dog.
> 
> In terms of generic SQL you may be right (although I'd be willing to
> argue for it). However, I think that for a specific DB product, it is
> a good thing to document without ambiguity what the properties and
> guarantees of the numeric types and operations are; and IEEE-754 is
> the only game in town when it comes to properly specified floating
> point numbers.

  OK, how does IEEE 754 define "NULL + 2.0"?  Not "let's pretend NULLs
  are NaN", but a NULL in the fullest SQL sense of that word.  Also
  remember that SQL is based off the Relational Model (which is where
  the concept of NULL comes from), which is a formally defined set of
  mathematics, complete with formal theorems and proofs, so you have to
  work consistently within that formal mathematical model.  That's
  going to be an interesting challenge, given that IEEE 754 doesn't even
  work within the model of basic arithmetic.

  IEEE 754, by itself, isn't enough to define a compute environment
  for something like SQL.  SQL has to extend and adapt it, just like any
  good numeric aware application does.  It is about 99% of what you
  need, but the edges and the details count.  SQL must step beyond just
  IEEE 754 alone, and once you've done that you might as well round off
  the odd and confusing edges.

> I feel this is especially true for the light-weight database system
> that is SQLite. I get the impression that you are advocating to keep
> floating point operations deliberately vague and underspecified
> (please correct me if I am wrong).

  Not at all.  I'm simply trying to say that IEEE 754 is a poor goal
  for this type of environment.  It is a great base, but you need more,
  and you have to go beyond it.

  My calculator returns "error" when I put in 1.0/0.0.  If it didn't,
  I'd throw it out.  I would expect SQL to return an error as well
  (as per the standard).  I would never expect it to return infinitely.
  SQLite's choice of returning NULL may not be fully standardized, but
  it makes a heck of a lot more sense to a database engineer than
  +Inf ever would.  IEEE 754 might say it is correct to return +Inf,
  but to any math student in the world that answer is flat out wrong,
  and most database users will as well, so it makes sense to scrap the
  IEEE 754 view of the world and give people what they expect.  This is
  even more true as IEEE 754 isn't in the center of this world.

  Now, it *would* be nice if there was more available documentation that
  went into how IEEE 754 is adapted to the SQLite model, and when, for
  example, Inf or NaN might or might not get translated into a NULL.

  While I think it is important for there to be distance between 
  IEEE 754 on the hardware and the numeric environment SQL presents to
  the end users, I'm all for documenting that distance as clearly as
  possible.  Not only does that allow those that do care about the
  IEEE 754 bits and flags have some hope of understanding what to
  expect, having that separation formally defined would also help keep
  it consistent.  I'm not trying to say that the IEEE 754 aspects
  should be hidden, only that SQL, and the numeric environment it
  presents, shouldn't bend its will to match the IEEE 754 view of the
  world down to every last detail.  IEEE 754 is part of the path, it is
  not the goal.

> > [...] This is what most high-level scripting languages like Perl
> > and Python do.
> 
> Perl and Python support NaNs and infinities just fine.

  Perhaps, but they don't provide an IEEE 754 environment.  Instead,
  they provide common-sense adaptations; they put distance between the
  IEEE 754 environment and their presented numeric environment:

$ python -c "print 1.0 / 0.0"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  ZeroDivisionError: float division

$ perl -e "printf 1.0 / 0.0;"
Illegal division by zero at -e line 1.

  That was one of your original complains... that SQLite didn't provide
  the "correct" answer of +Inf.  Neither do these languages.  They don't
  provide true IEEE 754 environments.

  And I think that's correct.  It is better for an expressive numeric
  environment to follow the understood rules of the problem domain,
  rather than the bit patterns underneath.  If you're dealing with C, 
  you should be dealing with IEEE 754 directly.  If you're at a higher
  level, such as these languages-- or SQL-- most developer will expect
  higher-level semantics... and not because they don't know what
  they're doing, just because 99.9% of the time, we don't care.

> > As you've pointed out, SQLite is more than capable of storing and 
> > retrieving non-numeric IEEE 754 values
>
> No, it doesn't support storing and retrieving NaNs.

  You're right...  Even using the C API, if you try to call
  sqlite3_bind_double( stmt, p, 0.0 / 0.0 ), the NaN is converted into
  a NULL before it is written to the database.  Inf works just fine,
  however.  In other words, SQLite is actually following a somewhat
  more consistent mapping than I had realized.

    -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H > 
"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Handling of IEEE-754 nan and +/-inf in SQLite?

Reply via email to