This is a problem of numerical stability, and there is no solution for
such a problem in MPI. Usually, preconditioning the input matrix
improve the numerical stability.
If you read the MPI standard, there is a __short__ section about what
guarantees the MPI collective communications provide. There is only
one: if you run the same collective twice, on the same set of nodes
with the same input data, you will get the same output. In fact the
main problem is that MPI consider all default operations (MPI_OP) as
being commutative and associative, which is usually the case in real
world but not when floating point rounding is around. When you
increase the number of nodes, the data will be spread in smaller
pieces, which means more operations will have to be done in order to
achieve the reduction, i.e. more rounding errors might occur and so on.
Thanks,
george.
On May 27, 2009, at 11:16 , vasilis wrote:
Rank 0 accumulates all the res_cpu values into a single array,
res. It
starts with its own res_cpu and then adds all other processes. When
np=2, that means the order is prescribed. When np>2, the order is no
longer prescribed and some floating-point rounding variations can
start
to occur.
Yes you are right. Now, the question is why would these floating-
point rounding
variations occur for np>2? It cannot be due to a not prescribed
order!!
If you want results to be more deterministic, you need to fix the
order
in which res is aggregated. E.g., instead of using MPI_ANY_SOURCE,
loop
over the peer processes in a specific order.
P.S. It seems to me that you could use MPI collective operations to
implement what you're doing. E.g., something like:
I could use these operations for the res variable (Will it make the
summation
any faster?). But, I can not use them for the other 3 variables.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users