Dave Love wrote:
Asad Ali <asa...@gmail.com> writes:

>From run to run the results can only be different if you either use
different input/output or use different random number seeds. Here in my case
the random number seeds are the same as well.

Sorry, but that's naïve, even if you can prove your code is well-defined
according to the language and floating-point standards.  You should
listen to Ashley, and if it worries you, you really need just to debug
it.  If you believe it's a problem with open-mpi, you at least have to
demonstrate results with a different MPI.

Or run a serial version on the same set of machines,
compiled in similar ways (compiler version, opt flags, etc)
to the parallel versions, and compare results.
If the results don't differ, then you can start blaming MPI.

In my experience, most of the time MPI doesn't
contribute significantly or at all to the numerical
difference in results.
On the other hand, compiler flags (particularly optimization),
compiler versions, different hardware,
different OS, different libraries (e.g. math libraries),
have a significant effect.

Bit-by-bit matching can be hardly achieved in complex programs.
It is a chimera.
You only give it a chance if you enforce IEEE standard
(which somebody already suggested to you),
and hope that the compiler really does it right.
However, beware that enforcing IEEE standard brings along a performance
penalty.

Well designed algorithms are also important.
There are some old famous (infamous?) FFTs still in use out there
that can boost your round-off errors in a few iterations.
On different hardware, or with different optimization flags,
the error amplification can differ also.

We run many complex programs that produce results that differ slightly.
The good ones produce differences at the round-off level.
But the world is not always so good.

I hope this helps.
Gus Correa


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to