[sqlite] Sqlite's exposure to floating point errors under SQLITE_MIXED_ENDIAN_64BIT_FLOAT

ir. F.T.M. van Vugt bc. Tue, 04 Sep 2007 00:34:10 -0700

L.S.

I've noticed that on a platform for which SQLITE_MIXED_ENDIAN_64BIT_FLOAT 
needs to be defined, Sqlite exposes itself to any difference between the 
floating point implementation in Sqlite as opposed to the one used by the 
underlying platform. To me this seems like something that needs to be 
avoided. This post explains what's going on and contains a patch to correct 
the behaviour.


The platform for which I was noticing this is an arm-based AML7100 handheld 
barcodescanner, specifically the processor involved is a StrongArm 1110 with 
a v4 core. Most likely also due to the hardware setup, it is swapping quads. 
In earlier versions of Sqlite, this swapping was noticed and solved, see also 
the following post:

http://www.mail-archive.com/sqlite-users@sqlite.org/msg09745.html

When I picked up the latest version of Sqlite, I was happy to see that the 
newer version of the code had adapted the idea and allowed for 
SQLITE_MIXED_ENDIAN_64BIT_FLOAT to be set. The comments in the code for this 
macro hint that this swapping of quads is a gcc-only thing (newer versions 
should not be exposed to this), which might be true for the ARM7 family, but 
it could mention a bit more specifically that various other Arm-processors 
seem to allow for both mixed endian modes as well as mixed 'wiring' of 
memory, for which certain choices may result in such a mixed-endian setting. 
I'm far from an Arm-expert though, so I'll leave the details to someone 
else ;)

Anyway, when this macro was set, reading floats from databases (even those 
produced on for example an i386 architecture) worked properly, but writing 
and re-reading data from the database failed (the reread values seemed to 
make no sense at all). After recompiling Sqlite with debug enabled, 
assertions in the code like the one below also failed

vbdeaux.c:1953
      static const u64 t1 = ((u64)0x3ff00000)<<32;
      static const double r1 = 1.0;
      double r2 = r1;
      swapMixedEndianFloat(r2);
      assert( sizeof(r2)==sizeof(t1) && memcmp(&r2, &t1, sizeof(r1))==0 );


Further investigation revealed the cause for this and it is kinda nasty....

This particular processor lacks a FP-unit (like many other Arm-processors) and 
therefor depends on either the FP-support in the kernel or on using 
soft-float from gcc. The latter is often not an easy choice when 
crosscompiling, so in my case too, the kernel's FP-support was being used. 
This basically means that gcc generates FP-opcodes that will raise an 
exception on the cpu, which the kernel will catch, after which it'll 
interpret and reroute the instruction to it's FP-engine. This obviously has a 
downside efficiency-wise, but so be it.

The (pre-installed) kernel that was being used here (2.4.17) is offering two 
kinds of math emulation (later kernels have more): CONFIG_FPE_NWFPE and 
CONFIG_FPE_FASTFPE. Although NWFPE is default, this particular kernel had 
been configured with FASTFPE. The interesting part from the docs on 
CONFIG_FPE_FASTFPE is this: "This is an experimental much faster emulator 
which has only 32 bit precision for the mantissa."

So, I was getting burned by the cut down precision of doubles used in the 
FP-engine of the kernel. Though this would normally not show up, the 
quad-swapped values in the current version of sqlite are actually treated AS 
DOUBLES on various places in the code. When this happens, the kernel 
FP-emulator will actually touch these values, but now the lesser precision of 
the double *do* work out in an unpredictable (kinda) way, simply because the 
quads are swapped......

This behaviour was verified by looking at the actual values used on various 
places in the code:

* the FP64 representation of 1.0 is 3ff0 0000 0000 0000

* upon an 'insert into <table> values (1.0)' I see that the value Sqlite tries 
to insert is 3ff0 0000 0040 0000........ the '3ff' is the sign and exponent 
part, the rest of the 13 * 4 bits are the fraction, but since the fast 
floating point emulator in the kernel only has an accuracy of under 32 bits, 
we see the '4' popping up

* after swapping quads, the value that is actually going to be inserted by 
sqlite is 3fe0 0000 0040 0000; the reason for this being that the second 'f' 
that is changed into 'e' after swapping quads is on the exact same spot as 
where the '4' was before swapping: at the end of the accuracy range of the 
fast floating point emulator

* obviously, this is also the reason why the assert fails when --enable-debug 
is being used, it's doing this exact same test and it notices the difference 
between 03ff and 03fe


So, to wrap this up, because floats are treated as floats after swapping as 
well, Sqlite is basically exposing itself to any quirk in FP engines on any 
platform for which the mixed endian macro needs to be used, which in my 
humble opinion is not a Good Thing(tm) ;)

I'm unsure how D. Richard Hipp would prefer to handle this, I reported these 
findings directly to him before posting here, but got no response ;(

Specifically I didn't want to touch the assert tests nor his comment in the 
current code:

** (later):  It is reported to me that the mixed-endian problem
** on ARM7 is an issue with GCC, not with the ARM7 chip.  It seems
** that early versions of GCC stored the two words of a 64-bit
** float in the wrong order.  And that error has been propagated
** ever since.  The blame is not necessarily with GCC, though.
** GCC might have just copying the problem from a prior compiler.
** I am also told that newer versions of GCC that follow a different
** ABI get the byte order right.

As described earlier, I think there are a few non-ARM7 platforms out there for 
which the mixed endian thingy really is a hardware issue. The fact that my 
use of a gcc v4.1.1 based crosscompiler didn't change anything seems to 
confirm this. This fact could be mentioned in the comment too.


So, the patch below focuses on the cause of the problem only, it's proper 
working is verified with both a gcc-2.95.3 based crosscompiler as well as a 
gcc-4.1.1 based crosscompiler. The things that remain is correcting the 
assert in sqlite3VdbeSerialGet() that gets triggered when 
using --enable-debug and correcting the comments.



diff -u -r sqlite-3.4.2-orig/src/vdbeaux.c sqlite-3.4.2/src/vdbeaux.c
--- sqlite-3.4.2-orig/src/vdbeaux.c     2007-08-27 16:46:46.000000000 +0200
+++ sqlite-3.4.2/src/vdbeaux.c  2007-08-30 12:13:14.000000000 +0200
@@ -1815,9 +1815,9 @@
 ** floating point values is correct.
 */
 #ifdef SQLITE_MIXED_ENDIAN_64BIT_FLOAT
-static double floatSwap(double in){
+static u64 floatSwap(u64 in){
   union {
-    double r;
+    u64 r;
     u32 i[2];
   } u;
   u32 t;
@@ -1861,8 +1861,8 @@
     int i;
     if( serial_type==7 ){
       assert( sizeof(v)==sizeof(pMem->r) );
-      swapMixedEndianFloat(pMem->r);
       memcpy(&v, &pMem->r, sizeof(v));
+      swapMixedEndianFloat(v);
     }else{
       v = pMem->u.i;
     }
@@ -1965,8 +1965,8 @@
         pMem->flags = MEM_Int;
       }else{
         assert( sizeof(x)==8 && sizeof(pMem->r)==8 );
+        swapMixedEndianFloat(x);
         memcpy(&pMem->r, &x, sizeof(x));
-        swapMixedEndianFloat(pMem->r);
         pMem->flags = MEM_Real;
       }
       return 8;







Best,





Frank.

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

[sqlite] Sqlite's exposure to floating point errors under SQLITE_MIXED_ENDIAN_64BIT_FLOAT

Reply via email to