vasilis wrote:
The original issue, still reflected by the subject heading of this e-mail, was that a message overran its receive buffer. That was fixed by using tags to distinguish different kinds of messages (res, jacob, row, and col).On Wednesday 27 of May 2009 8:35:49 pm Eugene Loh wrote:At the level of this particular e-mail thread, the issue seems to me to be different. Results are added together in some arbitrary order and there are variations on order of 10^-10. This is not an issue of numerical stability, but just of bitwise floating-point reproducibility.And, given that one could fix the order (by using explicit source processes instead of MPI_ANY_SOURCE), one could "fix" this particular problem in MPI.Eugene, I really do not understand why you insist on the order. Maybe there is something subtle about the order that I do not understand. Anyhow, I changed the code according to your suggestion. I thought the next problem was the small (10^-10) variations in results when np>2. In my mind, a plausible explanation for this is that you're adding the "res_cpu" contributions from all the various processes to the "res" array in some arbitrary order. The contribution from rank 0 is added in first, but all the others come in in some nondeterministic order. Since you're using finite-precision arithmetic, this can lead to tiny round-off variations. If you want to get rid of those minor variations, you have to perform floating-point arithmetic in a particular order. |
- Re: [OMPI users] "An error occurred in MPI_Recv" ... vasilis
- Re: [OMPI users] "An error occurred in MPI_Recv&q... Eugene Loh
- Re: [OMPI users] "An error occurred in MPI_Re... vasilis
- Re: [OMPI users] "An error occurred in MP... Eugene Loh
- Re: [OMPI users] "An error occurred in MP... George Bosilca
- Re: [OMPI users] "An error occurred i... Damien Hocking
- Re: [OMPI users] "An error occurred i... Eugene Loh
- Re: [OMPI users] "An error occur... vasilis
- Re: [OMPI users] "An error occur... Eugene Loh
- Re: [OMPI users] "An error occur... vasilis
- Re: [OMPI users] "An error occur... Eugene Loh
- Re: [OMPI users] "An error occurred i... vasilis