On 12/18/18 6:01 AM, R Smith wrote:> On 2018/12/17 11:53 PM, Dennis
Clarke wrote:
>>
>> This thread is getting out of hand. Firstly there is no such binary
>> representation ( in this universe ) for a trivial decimal number such as
>> one tenth ( 0.10 ) and really folks should refer to the text book
>> recently published ( 2nd Edition actually ) where all this is covered
>> : //....
>
>
> My good man, did the discussion really irritate you ...
[WARNING : written with a smile ]
Well, I guess the real issue is that I see fairly baseline stuff going
in circles over and over and over and over and very little clarity. I
merely posted that textbook because it really was written by the experts
and I have done a fair amount of emails in life back and forth to a few
of the authors on various topics who always cleared the air. They really
are the experts and the only name missing from that list is the great
and dreaded William Kahan himself. I say that with a smile as Professor
Emeritus of Mathematics Kahan is well known to write very bluntly about
people who have not a clue about trivial things. Trivial to him. The
rest of us merely try to catch up and get a solid understanding of the
basics which, as I was saying, have been covered over and over and over
and over in circles over and over and it gets ... annoying to walk into
a store and hear Beethoven's Ninth Symphony played over dirty cheap
speakers. Certainly when I have heard live performances a few times in
my life. Very irritating is the word.
I see this sort of thing happen from time to time and I have to take the
approach of 'care' or 'do not care'. In the case of the store with bad
speakers the option is 'do not care' and simply accept the noise. In the
case were good people can be led down a wrong path and then fall into a
pitfall or trap I feel 'care' happens. I am not a sociopath nor some
old greybeard UNIX geek that merely enjoys retirement too much to 'care'
anymore.
So let's play a little game based on an early lesson from William Kahan
which will demonstrate how poorly floating point works when used with
nothing but blind trust in bit games. Also we will assume that we are
going to play by the rules of his IEEE 754 and I may make a passing
reference to these three documents :
1 ) IEEE 754-2008 - IEEE Standard for Floating-Point Arithmetic
https://standards.ieee.org/standard/754-2008.html
2 ) Formal Verification of Floating-Point Hardware Design
https://www.springer.com/us/book/9783319955124
3 ) Handbook of Floating-Point Arithmetic
https://www.springer.com/us/book/9783319765259
4 ) The Art of Computer Programming
https://cs.stanford.edu/~knuth/taocp.html
5 ) Oral history interview with Donald E. Knuth
Charles Babbage Institute, 2001
https://conservancy.umn.edu/handle/11299/107413
6 ) How Futile are Mindless Assessments of Roundoff
in Floating-Point Computation ?
Prof William Kahan
https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf
Also perhaps ISO/IEC/IEEE 60559:2011 and working group ISO/IEC
JTC1/SC22/WG11 publications.
Let's begin with a quick and incomplete definition of "floating point"
data representation thus :
Given a radix B with precision p we express a 'floting point'
number in the format
[ (+/-)m_0 . m_1 m_2 m_3 ... m_(p-1) ] * B^e
where we call e the exponent which is always an integer and the
expression m_0 . m_1 m_2 m_3 ... m_(p-1) will be called the not
so friendy word "significand" which is expressed in radix B.
A complete and formal definition will be found in chapter 3 of reference
(3) above. This is not a new idea and in fact is quite old. One may find
a fairly nice history of "floating point" in computing machines such as
the Babbage difference engine ( see Charles Babbage et. al. ) and other
machines that performed numerical computation in (4) and (5) above.
Suffice it to say that radix 60 mathematics was common in the Babylonian
history and the Yale Babylonian Collection provides a tablet with an
approximation of the positive square root of two with four sexagesimal
digits 1, 24, 51, 10. Numerical data representation is not new at all
however I personally fell into the problem in while working on long term
trajectory computations with early Apollo systems. We had not yet been
able to establish a formal method to detect and handle numerical error
conditions and the much loved William Kahan provides us this rather
trivial example to illustrate:
Given u_0 = 2 and u_1 = -4 we then compute
u_[n] = 111 - 1130/(u[n-1]) + 3000/(u[n-1]*u[n-2])
where n > 1 clearly.
Trivial :
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define LOOPCNT 30
int main (int argc, char *argv[])
{
long double u[LOOPCNT];
int j;
u[0] = (long double)2.0L;
u[1] = (long double)-4.0L;
printf("u[00] = %+20.18Lf\n", u[0]);
printf("u[01] = %+20.18Lf\n", u[1]);
for (j = 2; j < LOOPCNT; j++){
u[j] = (long double)111.0L
- (long double)1130.0L/u[j-1]
+ (long double)3000.0L/(u[j-1]*u[j-2]);
printf("u[%02i] = %+20.18Lf\n", j, u[j]);
}
return EXIT_SUCCESS;
}
$ $CC $CFLAGS -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o foo foo.c
$ ./foo
u[00] = +2.000000000000000000
u[01] = -4.000000000000000000
u[02] = +18.500000000000000000
u[03] = +9.378378378378378379
u[04] = +7.801152737752161387
u[05] = +7.154414480975249402
u[06] = +6.806784736923633658
u[07] = +6.592632768704448309
u[08] = +6.449465933790438615
u[09] = +6.348452056656695458
u[10] = +6.274438598253189794
u[11] = +6.218695740390240090
u[12] = +6.175837314378373112
u[13] = +6.142359234424345102
u[14] = +6.115885561182135076
u[15] = +6.094780237887624276
u[16] = +6.078391827712722059
u[17] = +6.074956741464125884
u[18] = +6.234084939357549628
u[19] = +8.953054978972347186
u[20] = +38.535952308620171281
u[21] = +90.372020211576035841
u[22] = +99.357562247636820436
u[23] = +99.961042723675680582
u[24] = +99.997653560600116965
u[25] = +99.999858805729773391
u[26] = +99.999991508101661861
u[27] = +99.999999489474548205
u[28] = +99.999999969317897322
u[29] = +99.999999998156545070
$
So there we have it. The sequence merely settles in at 100 over time.
However one may sit down with pen and paper and easily arrive at the
correct answer of 6 and then everyone should question how a piece of
computing machiney can be so vast horribly wrong with such a trivial
calculation. Please note that most x86_64 hardware does a terrible mess
with the above and there are far better floating point implementations
in IBM Power and Fujitsu SPARC and other risc architectures. We are
working on RISC-V with native 128 bit precision today and perhaps we may
have better hardware than x86_64 to the world shortly. However that is
a side comment.
Let's look at a bit of code from our friendly Vincent Lefèvre as well as
Jean-Michel Muller ( http://perso.ens-lyon.fr/jean-michel.muller/ ) :
/*
* $Id: pb-muller1.c 47490 2011-11-10 17:07:08Z vinc17/ypig $
*
* Link with -lmpfr -lgmp *in that order*
*
* Example by Jean-Michel Muller:
* u_0 = 2
* u_1 = -4
* u_{n+1} = 111 - 1130 / u_n + 3000 / (u_n u_{n-1})
* --> 6 (not 100)
*
* Arguments: precision, n
*/
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
#include <mpfr.h>
static const mp_rnd_t rnd[3] = { MPFR_RNDD, MPFR_RNDN, MPFR_RNDU };
static int nep = 0;
static void check_sign (mpfr_t x[3])
{
if (!nep && MPFR_SIGN (x[0]) != MPFR_SIGN (x[2]))
nep = 1;
}
static void mdiv (mpfr_t tmp, unsigned long c, mpfr_t x[3], int r)
{
int neg;
neg = MPFR_SIGN (x[0]) < 0;
mpfr_set_ui (tmp, c, rnd[neg ? 2-r : r]);
mpfr_div (tmp, tmp, x[2-r], rnd[r]);
}
static void display (mpfr_t x[3], long k)
{
int i;
printf ("%6ld ", k);
for (i = 0; i < 3; i++)
{
size_t len;
len = (i | nep) == i ? mpfr_out_str (stdout, 10, 17, x[i],
rnd[i]) : 0;
if (i < 2)
while (len++ < 24)
putchar (' ');
}
putchar ('\n');
}
int main (int argc, char *argv[])
{
int prec, n, i, k;
mpfr_t u[3], v[3], p[3], tmp;
if (argc != 3)
{
fprintf (stderr, "Usage: pb-muller1 <prec> <n>\n");
exit (EXIT_FAILURE);
}
prec = atoi (argv[1]);
if (prec < MPFR_PREC_MIN || prec > MPFR_PREC_MAX)
{
fprintf (stderr, "pb-muller1: incorrect precision %d\n", prec);
exit (EXIT_FAILURE);
}
mpfr_set_default_prec (prec);
n = atoi (argv[2]);
if (n <= 0)
{
fprintf (stderr, "pb-muller1: n must be positive\n");
exit (EXIT_FAILURE);
}
for (i = 0; i < 3; i++)
{
mpfr_init (u[i]);
mpfr_init (v[i]);
mpfr_init (p[i]);
mpfr_set_si (u[i], 2, rnd[i]);
mpfr_set_si (v[i], -4, rnd[i]);
}
mpfr_init (tmp);
printf(" N ");
printf("MPFR_RNDD");
printf(" MPFR_RNDN");
printf(" MPFR_RNDU\n");
display (u, 0);
display (v, 1);
for (k = 2; k <= n; k++)
{
int s;
check_sign (v);
s = MPFR_SIGN(u[1]) * MPFR_SIGN(v[1]);
for (i = 0; i < 3; i++)
if ((i | nep) == i)
{
mpfr_mul (p[s < 0 ? 2-i : i], u[i], v[i], rnd[i]);
mpfr_set (u[i], v[i], rnd[i]);
}
check_sign (p);
for (i = 0; i < 3; i++)
if ((i | nep) == i)
{
mpfr_set_si (v[i], 111, rnd[i]);
mdiv (tmp, 1130, u, 2-i);
mpfr_sub (v[i], v[i], tmp, rnd[2-i]);
mdiv (tmp, 3000, p, i);
mpfr_add (v[i], v[i], tmp, rnd[i]);
}
display (v, k);
}
for (i = 0; i < 3; i++)
{
mpfr_clear (u[i]);
mpfr_clear (v[i]);
mpfr_clear (p[i]);
}
mpfr_clear (tmp);
return 0;
}
/*
First 11 decimals of u_{100}: 6.0000000160, with prec >= 437,
and >= 577 to guarantee the result in interval arithmetic.
*/
$
Here we use extended precision floating point libs to try to reproduce
whatever has happened in our terrible code above :
$ $CC $CFLAGS -I/usr/local/include -L/usr/local/lib
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o pb-muller1 pb-muller1.c
-lmpfr -lgmp
$ ./pb-muller1
Usage: pb-muller1 <prec> <n>
$ ./pb-muller1 113 30
N MPFR_RNDD MPFR_RNDN MPFR_RNDU
0 2.0000000000000000 2.0000000000000000 2.0000000000000000
1 -4.0000000000000000 -4.0000000000000000
-4.0000000000000000
2 1.8500000000000000e1 1.8500000000000000e1
1.8500000000000000e1
3 9.3783783783783783 9.3783783783783784 9.3783783783783784
4 7.8011527377521613 7.8011527377521614 7.8011527377521614
5 7.1544144809752493 7.1544144809752494 7.1544144809752494
6 6.8067847369236329 6.8067847369236330 6.8067847369236330
7 6.5926327687044383 6.5926327687044384 6.5926327687044384
8 6.4494659337902879 6.4494659337902880 6.4494659337902880
9 6.3484520566543571 6.3484520566543571 6.3484520566543572
10 6.2744385982163279 6.2744385982163279 6.2744385982163280
11 6.2186957398023977 6.2186957398023978 6.2186957398023978
12 6.1758373049212301 6.1758373049212301 6.1758373049212302
13 6.1423590812383559 6.1423590812383559 6.1423590812383560
14 6.1158830665510803 6.1158830665510808 6.1158830665510812
15 6.0947394393336623 6.0947394393336811 6.0947394393337000
16 6.0777223048464167 6.0777223048472427 6.0777223048480687
17 6.0639403224632851 6.0639403224998087 6.0639403225363323
18 6.0527217593924258 6.0527217610161519 6.0527217626398778
19 6.0435520376871057 6.0435521101892642 6.0435521826914195
20 6.0360286321323002 6.0360318810817798 6.0360351300311756
21 6.0297013055414662 6.0298473250226262 6.0299933444426353
22 6.0181710657610797 6.0247496523456908 6.0313281169442034
23 5.7234455935329964 6.0205399837103304 6.3173863825532198
24 -7.6979694052129188 6.0170582514956451
1.9224751938450067e1
25 6.0141748176010937
26 6.0117829758099306
27 6.0097744237027211
28 6.0077081713888591
29 5.9993587346059263
30 5.8818446518449744
$
I use 113 bits for obvious reasons here.
Please see
https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format.
The above results illustrate the wildly different results we can get
when we use different methods of "rounding" on the ULP which is the unit
of least precision. We seem to circle the correct result with "rounding
to even" but we don't quite get there. In fact we can not. Using 113
bits of binary precision in the significand will not be sufficient. My
own experiments show that we need a minimal 172 bits and I reveal the
state of various floating point flags :
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <gmp.h>
#include <mpfr.h>
#define PREC 172 /* based on experimental results */
#define LOOPCNT 32
int mpfr_check_flags( int status );
int main (int argc, char *argv[]) {
mpfr_t u, v, w, t0, t1, t2, c111, c1130, c3000;
mpfr_flags_t mpfr_flags;
int mpfr_underflow_flag, mpfr_overflow_flag, mpfr_divby0_flag,
mpfr_nanflag_flag, mpfr_inexflag_flag, mpfr_erangeflag_flag;
int i, max = LOOPCNT;
int inex; /* ret from mpfr calls */
printf(" GMP vers : %d.%d.%d\n", __GNU_MP_VERSION,
__GNU_MP_VERSION_MINOR, __GNU_MP_VERSION_PATCHLEVEL );
printf("MPFR vers : %-12s\nMPFR header: %s (based on %d.%d.%d)\n",
mpfr_get_version (), MPFR_VERSION_STRING, MPFR_VERSION_MAJOR,
MPFR_VERSION_MINOR, MPFR_VERSION_PATCHLEVEL);
printf("MPFR_VERSION_MAJOR = %d\n", MPFR_VERSION_MAJOR);
printf("MPFR_VERSION_MINOR = %d\n", MPFR_VERSION_MINOR);
printf("MPFR_VERSION_PATCHLEVEL = %d\n", MPFR_VERSION_PATCHLEVEL);
if (mpfr_buildopt_tls_p()!=0)
printf(" : compiled as thread safe using TLS\n");
if (mpfr_buildopt_float128_p()!=0)
printf(" : __float128 support enabled\n");
if (mpfr_buildopt_decimal_p()!=0)
printf(" : decimal float support enabled\n");
if (mpfr_buildopt_gmpinternals_p()!=0)
printf(" : compiled with GMP internals\n");
if (mpfr_buildopt_sharedcache_p()!=0)
printf(" : threads share cache per MPFR const\n");
printf("MPFR thresholds file used at compile time : %s\n",
mpfr_buildopt_tune_case () );
printf("\n\n");
mpfr_set_default_prec ((mpfr_prec_t) PREC);
mpfr_inits(u, v, w, t0, t1, t2, c111, c1130, c3000, (mpfr_ptr)
NULL);
inex = mpfr_set_si(u, (long) 2, MPFR_RNDN);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set_si(u, (long) 2, MPFR_RNDN)\n");
goto fail_out;
}
inex = mpfr_set_si(v, (long) -4, MPFR_RNDN);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set_si(v, (long) -4,
MPFR_RNDN)\n");
goto fail_out;
}
inex = mpfr_set_si(c111, (long) 111, MPFR_RNDN);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set_si(c111, (long) 111,
MPFR_RNDN)\n");
goto fail_out;
}
inex = mpfr_set_si(c1130, (long) 1130, MPFR_RNDN);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set_si(c1130, (long) 1130,
MPFR_RNDN)\n");
goto fail_out;
}
inex = mpfr_set_si(c3000, (long) 3000, MPFR_RNDN);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set_si(c3000, (long) 3000,
MPFR_RNDN)\n");
goto fail_out;
}
for (i = 3; i <= max; i++){
mpfr_clear_flags();
inex = mpfr_div(t0, c1130, v, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_div(t0, c1130, v,
MPFR_RNDN)\n");
fprintf(stderr," : returned %i\n", inex );
mpfr_printf("t0 = %.30Rg\n", t0);
goto fail_out;
}
mpfr_printf("t0 = 1130 / v = %.30Rg\n", t0);
mpfr_clear_flags();
inex = mpfr_sub(w, c111, t0, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_sub(w, c111, t0,
MPFR_RNDN)\n");
goto fail_out;
}
mpfr_printf("w = 111 - t0 = %.30Rg\n", w);
mpfr_clear_flags();
inex = mpfr_mul(t1, v, u, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_mul(t1, v, u, MPFR_RNDN)\n");
goto fail_out;
}
mpfr_printf("t1 = v * u = %.30Rg\n", t1);
mpfr_clear_flags();
inex = mpfr_div(t2, c3000, t1, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_div(t2, c3000, t1,
MPFR_RNDN)\n");
goto fail_out;
}
mpfr_printf("t2 = 3000 / t1 = %.30Rg\n", t2);
mpfr_clear_flags();
inex = mpfr_add(w, w, t2, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_add(w, w, t2, MPFR_RNDN)\n");
goto fail_out;
}
mpfr_printf("w = w + t2 = %.30Rg\n", w);
mpfr_clear_flags();
inex = mpfr_set(u, v, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set(u, v, MPFR_RNDN)\n");
goto fail_out;
}
mpfr_printf ("u = v = %.30Rg\n", u);
mpfr_clear_flags();
inex = mpfr_set(v, w, MPFR_RNDN);
inex = mpfr_check_flags(inex);
if ( inex != 0 ){
fprintf(stderr,"FAIL : mpfr_set(v, w, MPFR_RNDN)\n");
goto fail_out;
}
mpfr_printf ("v = w = %.30Rg\n\n", v);
printf("u%02i = ", i);
mpfr_printf ("%.30Rg\n", v);
}
mpfr_clears( u, v, w, t0, t1, t2, c111, c1130, c3000,
(mpfr_ptr) NULL);
return EXIT_SUCCESS;
fail_out:
mpfr_clears( u, v, w, t0, t1, t2, c111, c1130, c3000,
(mpfr_ptr) NULL);
return EXIT_FAILURE;
}
$ $CC $CFLAGS -I/usr/local/include -L/usr/local/lib
-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o hfpa_page_11
hfpa_page_11.c -lmpfr -lgmp
$ ./hfpa_page_11
GMP vers : 6.1.2
MPFR vers : 4.0.1-p13
MPFR header: 4.0.1-p13 (based on 4.0.1)
MPFR_VERSION_MAJOR = 4
MPFR_VERSION_MINOR = 0
MPFR_VERSION_PATCHLEVEL = 1
: compiled as thread safe using TLS
MPFR thresholds file used at compile time : src/amd/k8/mparam.h
t0 = 1130 / v = -282.5
w = 111 - t0 = 393.5
t1 = v * u = -8
t2 = 3000 / t1 = -375
w = w + t2 = 18.5
u = v = -4
v = w = 18.5
u03 = 18.5
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 61.0810810810810810810810810811
w = 111 - t0 = 49.9189189189189189189189189189
t1 = v * u = -74
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = -40.5405405405405405405405405405
w = w + t2 = 9.37837837837837837837837837838
u = v = 18.5
v = w = 9.37837837837837837837837837838
u04 = 9.37837837837837837837837837838
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 120.489913544668587896253602305
w = 111 - t0 = -9.48991354466858789625360230548
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 173.5
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 17.2910662824207492795389048991
w = w + t2 = 7.80115273775216138328530259366
u = v = 9.37837837837837837837837837838
v = w = 7.80115273775216138328530259366
u05 = 7.80115273775216138328530259366
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 144.850387883265607683782785371
w = 111 - t0 = -33.8503878832656076837827853713
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 73.1621621621621621621621621622
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 41.0048023642408570373106760251
w = w + t2 = 7.15441448097524935352789065386
u = v = 7.80115273775216138328530259366
v = w = 7.15441448097524935352789065386
u06 = 7.15441448097524935352789065386
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 157.944441575876490938193834874
w = 111 - t0 = -46.9444415758764909381938348738
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 55.8126801152737752161383285303
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 53.75122631280012392213559147
w = w + t2 = 6.80678473692363298394175659627
u = v = 7.15441448097524935352789065386
v = w = 6.80678473692363298394175659627
u07 = 6.80678473692363298394175659627
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 166.010832378799487206717895424
w = 111 - t0 = -55.0108323787994872067178954235
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 48.6985592907277428888067971925
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 61.6034651475039255994598981999
w = w + t2 = 6.59263276870443839274200277637
u = v = 6.80678473692363298394175659627
v = w = 6.59263276870443839274200277637
u08 = 6.59263276870443839274200277637
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 171.403449827232486505953949374
w = 111 - t0 = -60.4034498272324865059539493745
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 44.874632106159962823359322559
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 66.8529157610227744748224285946
w = w + t2 = 6.44946593379028796886847922015
u = v = 6.59263276870443839274200277637
v = w = 6.44946593379028796886847922015
u09 = 6.44946593379028796886847922015
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 175.208305866019214125873951209
w = 111 - t0 = -64.2083058660192141258739512095
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 42.51896045574882232016203054
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 70.5567579226735712729746427703
w = w + t2 = 6.34845205665435714710069156081
u = v = 6.44946593379028796886847922015
v = w = 6.34845205665435714710069156081
u10 = 6.34845205665435714710069156081
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 177.996146133851648579093411262
w = 111 - t0 = -66.996146133851648579093411262
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 40.9441252716931676575532714216
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 73.270584732067976492922789724
w = w + t2 = 6.27443859821632791382937846207
u = v = 6.34845205665435714710069156081
v = w = 6.27443859821632791382937846207
u11 = 6.27443859821632791382937846207
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 180.095793800776349973236624212
w = 111 - t0 = -69.0957938007763499732366242119
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 39.8329726231979286181076071689
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 75.3144895405787477615577757815
w = w + t2 = 6.21869573980239778832115156959
u = v = 6.27443859821632791382937846207
v = w = 6.21869573980239778832115156959
u12 = 6.21869573980239778832115156959
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 181.71012818130033215181419855
w = 111 - t0 = -70.7101281813003321518141985497
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 39.0188245803796070521231630828
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 76.8859654862215622716775384113
w = w + t2 = 6.17583730492123011986333986151
u = v = 6.21869573980239778832115156959
v = w = 6.17583730492123011986333986151
u13 = 6.17583730492123011986333986151
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 182.971141273355259617962054216
w = 111 - t0 = -71.9711412733552596179620542163
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 38.4056531378263756715326672655
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 78.1135003545936155573081943698
w = w + t2 = 6.14235908123835593934614015355
u = v = 6.17583730492123011986333986151
v = w = 6.14235908123835593934614015355
u14 = 6.14235908123835593934614015355
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 183.968404493242624565635107533
w = 111 - t0 = -72.9684044932426245656351075329
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 37.9342103541335313184967384766
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 79.0842875597937053293793082179
w = w + t2 = 6.11588306655108076374420068497
u = v = 6.14235908123835593934614015355
v = w = 6.11588306655108076374420068497
u15 = 6.11588306655108076374420068497
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 184.764814451764677499945188349
w = 111 - t0 = -73.7648144517646774999451883494
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 37.5659498936219153328075416891
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 79.8595538910983586282652275967
w = w + t2 = 6.09473943933368112832003924736
u = v = 6.11588306655108076374420068497
v = w = 6.09473943933368112832003924736
u16 = 6.09473943933368112832003924736
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 185.405793184087190266717806473
w = 111 - t0 = -74.405793184087190266717806473
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 37.2747137320618884011862075346
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 80.4835154889344330029996346197
w = w + t2 = 6.07772230484724273628182814674
u = v = 6.09473943933368112832003924736
v = w = 6.07772230484724273628182814674
u17 = 6.07772230484724273628182814674
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 185.924914519173870218307712212
w = 111 - t0 = -74.9249145191738702183077122119
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 37.042133832670492411520431721
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 80.9888548416736789735738791443
w = w + t2 = 6.06394032249980875526616693243
u = v = 6.07772230484724273628182814674
v = w = 6.06394032249980875526616693243
u18 = 6.06394032249980875526616693243
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 186.347480335058267383579967778
w = 111 - t0 = -75.347480335058267383579967778
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.8549453533196700991001096141
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 81.4002020960744195769362518193
w = w + t2 = 6.05272176101615219335628404129
u = v = 6.06394032249980875526616693243
v = w = 6.05272176101615219335628404129
u19 = 6.05272176101615219335628404129
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 186.692870516204205980381685952
w = 111 - t0 = -75.6928705162042059803816859523
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.7033435474978963079278362567
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 81.7364226263934748481591633164
w = w + t2 = 6.0435521101892688677774773641
u = v = 6.05272176101615219335628404129
v = w = 6.0435521101892688677774773641
u20 = 6.0435521101892688677774773641
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 186.976132479250061286265756921
w = 111 - t0 = -75.9761324792500612862657569211
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.5799393711776741269191244542
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.0121643603319180662764005427
w = w + t2 = 6.03603188108185678001064362156
u = v = 6.0435521101892688677774773641
v = w = 6.03603188108185678001064362156
u21 = 6.03603188108185678001064362156
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.209084090766363397143239031
w = 111 - t0 = -76.2090840907663633971432390312
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.4790732120819575455522510051
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.2389314157902652538562503843
w = w + t2 = 6.02984732502390185671301135315
u = v = 6.03603188108185678001064362156
v = w = 6.02984732502390185671301135315
u22 = 6.02984732502390185671301135315
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.401096427515395813994063809
w = 111 - t0 = -76.4010964275153958139940638091
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.3963506919004245801170798372
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.4258460798822437127375842389
w = w + t2 = 6.02474965236684789874352042985
u = v = 6.02984732502390185671301135315
v = w = 6.02474965236684789874352042985
u23 = 6.02474965236684789874352042985
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.559660600349561648427131321
w = 111 - t0 = -76.5596606003495616484271313206
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.3283205752629204238431248847
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.5802005844110777108582694273
w = w + t2 = 6.02053998406151606243113810664
u = v = 6.02474965236684789874352042985
v = w = 6.02053998406151606243113810664
u24 = 6.02053998406151606243113810664
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.690805640608133312326116058
w = 111 - t0 = -76.6908056406081333123261160582
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.272246176035326886178724729
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.7078638979371209235032988195
w = w + t2 = 6.0170582573289876111771827613
u = v = 6.02053998406151606243113810664
v = w = 6.0170582573289876111771827613
u25 = 6.0170582573289876111771827613
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.799411551919152411333134492
w = 111 - t0 = -76.7994115519191524113331344923
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.2259398246766766867425191838
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.8135864664699713738641132522
w = w + t2 = 6.01417491455081896253097875987
u = v = 6.0170582573289876111771827613
v = w = 6.01417491455081896253097875987
u26 = 6.01417491455081896253097875987
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.889447190179764969201236571
w = 111 - t0 = -76.8894471901797649692012365712
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.1876408306188637229490105522
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.9012317780510986425852750301
w = w + t2 = 6.01178458787133367338403845895
u = v = 6.01417491455081896253097875987
v = w = 6.01178458787133367338403845895
u27 = 6.01178458787133367338403845895
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 187.964153319757081280027483314
w = 111 - t0 = -76.9641533197570812800274833137
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.1559240600590085878407693164
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 82.9739545590555658478143724685
w = w + t2 = 6.00980123929848456778688915479
u = v = 6.01178458787133367338403845895
v = w = 6.00980123929848456778688915479
u28 = 6.00980123929848456778688915479
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 188.026185060972710385415447864
w = 111 - t0 = -77.0261850609727103854154478637
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.1296304665846704072244722298
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 83.0343394398849393132364128061
w = w + t2 = 6.00815437891222892782096494236
u = v = 6.00980123929848456778688915479
v = w = 6.00815437891222892782096494236
u29 = 6.00815437891222892782096494236
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 188.0777238291579164286779791
w = 111 - t0 = -77.0777238291579164286779790997
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.1078136322833302456565987844
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 83.0845099221891221872109178848
w = w + t2 = 6.00678609303120575853293878513
u = v = 6.00815437891222892782096494236
v = w = 6.00678609303120575853293878513
u30 = 6.00678609303120575853293878513
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 188.120566056276503242641451445
w = 111 - t0 = -77.1205660562765032426414514453
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.0896981680345182060442268243
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 83.1262147450479235105736445903
w = w + t2 = 6.00564868877142026793219314497
u = v = 6.00678609303120575853293878513
v = w = 6.00564868877142026793219314497
u31 = 6.00564868877142026793219314497
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t0 = 1130 / v = 188.156194036578734087817277753
w = 111 - t0 = -77.1561940365787340878172777526
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t1 = v * u = 36.0746470233432633440888930245
INFO : mpfr_inexflag_flag is set.
: clear flags and carry on.
t2 = 3000 / t1 = 83.1608968497669092896537174097
w = w + t2 = 6.0047028131881752018364396571
u = v = 6.00564868877142026793219314497
v = w = 6.0047028131881752018364396571
u32 = 6.0047028131881752018364396571
$
$ echo $?
0
We get the exact same result on IBM Power hardware and Oracle SPARC
given that the implementation uses cross platform floating point libs
libgmp and libmpfr.
However why are we seeing such horrific results from trivial code ?
Let me quote one of the authors of (3) Vincent Lefèvre :
... for any computation system on approximated numbers with a format
of bounded length, at least one operation will be inexact (because
if the sequence is computed exactly, you will get an infinity of
different numbers), and as soon as you get an error, the limit will
change from 6 to 100. Thus MPFR cannot help you to find the limit
(and obviously unums won't help you either). Well, with arbitrary
precision, you can increase the precision and observe the various
values, so that you can conjecture that the limit is 6, but this is
not a proof. Interval arithmetic in various precisions can be useful
to see what's going on with some additional details.
On this ( trivial ) example, MPFR can be used too to get bounds,
without needing an interval arithmetic package ...
However that is not in any book. Let's try to sum up here and merely say
that some serious reading and experiments are needed to get a good
handle on why numerical computation is as much art as it is science. If
we wander into the problem without sufficient study and VERY careful
consideration then we are doomed to repeat the errors of the past. I want
to point out that this has been quite long enough and I did not bother
to drag in output data from the IBM Power or RISC-V systems but all one
needs to know is that the results are poor if rounding errors are not
trapped and handled. Also, there will ALWAYS be inexact flags set and
computation rounding errors thrown in every language we have. People are
often "mindless" about how modern computing machinary works and have no
clue about roundoff errors. However I didn't say that. Kahan did. See
(6) above.
Dennis Clarke
ye old UNIX/Linux greybeard
ps: https://en.wikipedia.org/wiki/Minimax_approximation_algorithm
lastly just for fun :
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
size_t bin_printf ( uint8_t* f, size_t n );
int
main (int argc, char* argv[])
{
float foo = 1.100f;
float epsilon = 1.0E-7f; /* this is _about_ 0.000001 */
int fpe;
int j;
feclearexcept(FE_ALL_EXCEPT);
if (fesetround(FE_TONEAREST)!=0){
fprintf(stderr,"ERROR : can not set floating point rounding!\n");
return EXIT_FAILURE;
}
printf("starting value of foo = %8.7e\n", (double)foo );
bin_printf((uint8_t*)&foo, (size_t)sizeof(foo));
for ( j=0; fabsf(foo)>epsilon; j++ ){
/* do a trivial subtraction */
foo = foo - 0.10f;
fpe = fetestexcept(FE_ALL_EXCEPT);
if (fpe!=0){
printf("INFO : Exception raised was");
if(fpe & FE_INEXACT) printf(" FE_INEXACT");
if(fpe & FE_DIVBYZERO) printf(" FE_DIVBYZERO");
if(fpe & FE_UNDERFLOW) printf(" FE_UNDERFLOW");
if(fpe & FE_OVERFLOW) printf(" FE_OVERFLOW");
if(fpe & FE_INVALID) printf(" FE_INVALID");
printf("\n");
}
printf("foo = %8.7e\n", (double)foo );
bin_printf((uint8_t*)&foo, (size_t)sizeof(foo));
feclearexcept(FE_ALL_EXCEPT);
}
printf("\nfinal value of foo = %8.7e\n", (double)foo );
bin_printf((uint8_t*)&foo, (size_t)sizeof(foo));
return EXIT_SUCCESS;
}
$ ./subtraction
starting value of foo = 1.1000000e+00
00111111 10001100 11001100 11001101
INFO : Exception raised was FE_INEXACT
foo = 1.0000000e+00
00111111 10000000 00000000 00000000
INFO : Exception raised was FE_INEXACT
foo = 8.9999998e-01
00111111 01100110 01100110 01100110
INFO : Exception raised was FE_INEXACT
foo = 7.9999995e-01
00111111 01001100 11001100 11001100
INFO : Exception raised was FE_INEXACT
foo = 6.9999993e-01
00111111 00110011 00110011 00110010
INFO : Exception raised was FE_INEXACT
foo = 5.9999990e-01
00111111 00011001 10011001 10011000
INFO : Exception raised was FE_INEXACT
foo = 4.9999991e-01
00111110 11111111 11111111 11111101
INFO : Exception raised was FE_INEXACT
foo = 3.9999992e-01
00111110 11001100 11001100 11001010
INFO : Exception raised was FE_INEXACT
foo = 2.9999992e-01
00111110 10011001 10011001 10010111
INFO : Exception raised was FE_INEXACT
foo = 1.9999993e-01
00111110 01001100 11001100 11001000
foo = 9.9999927e-02
00111101 11001100 11001100 11000011
foo = -7.4505806e-08
10110011 10100000 00000000 00000000
final value of foo = -7.4505806e-08
10110011 10100000 00000000 00000000
$
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users