vasilis wrote:
The accumulation of res_cpu into res starts with rank 0 and then handles everyone else in arbitrary order (due to MPI_ANY_SOURCE). With np=2, this means the order is fully deterministic (0 then 1). With np>2, the order is no longer deterministic. E.g., for np=3, you could have 0 then 1 then 2, or you could have 0 then 2 then 1.Rank 0 accumulates all the res_cpu values into a single array, res. It starts with its own res_cpu and then adds all other processes. When np=2, that means the order is prescribed. When np>2, the order is no longer prescribed and some floating-point rounding variations can start to occur.Yes you are right. Now, the question is why would these floating-point rounding variations occur for np>2? It cannot be due to a not prescribed order!! Here is another version of the code, without MPI_ANY_SOURCE nor MPI_ANY_TAG: if( mumps_par%MYID .eq. 0 ) THEN do jw = 0, nsize-1 if ( jw /= 0 ) then call MPI_recv(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,jw,5,MPI_COMM_WORLD,status1,ierr) call MPI_recv( res_cpu,total_unknowns ,MPI_DOUBLE_PRECISION,jw,6,MPI_COMM_WORLD,status2,ierr) call MPI_recv( row_cpu,total_elem_cpu*unique,MPI_INTEGER ,jw,7,MPI_COMM_WORLD,status3,ierr) call MPI_recv( col_cpu,total_elem_cpu*unique,MPI_INTEGER ,jw,8,MPI_COMM_WORLD,status4,ierr) end if res (: ) = res (: ) + res_cpu(:) jacob (:,jw) = jacob(:,jw) + jacob_cpu(:) position_col(:,jw) = position_col(:,jw) + col_cpu(:) position_row(:,jw) = position_row(:,jw) + row_cpu(:) end do else call MPI_Send(jacob_cpu,total_elem_cpu*unique,MPI_DOUBLE_PRECISION,0,5,MPI_COMM_WORLD,ierr) call MPI_Send( res_cpu,total_unknowns ,MPI_DOUBLE_PRECISION,0,6,MPI_COMM_WORLD,ierr) call MPI_Send( row_cpu,total_elem_cpu*unique,MPI_INTEGER ,0,7,MPI_COMM_WORLD,ierr) call MPI_Send( col_cpu,total_elem_cpu*unique,MPI_INTEGER ,0,8,MPI_COMM_WORLD,ierr) end if Potentially faster. It allows the underlying MPI implementation to introduce optimizations (also potentially leading to the nondeterminism as you have observed!). The other reason to use collective operations, however, is to make your code more readable.P.S. It seems to me that you could use MPI collective operations to implement what you're doing. E.g., something like:I could use these operations for the res variable (Will it make the summation any faster?). You can use an MPI_Gather operation to gather the data to rank 0 and then perform the summation on-node. You need to decide (based on performance, readability, etc.) if you want to make that change.But, I can not use them for the other 3 variables. |
- [OMPI users] "An error occurred in MPI_Recv" with... vasilis
- Re: [OMPI users] "An error occurred in MPI_Recv&q... Eugene Loh
- Re: [OMPI users] "An error occurred in MPI_Re... vasilis
- Re: [OMPI users] "An error occurred in MP... Eugene Loh
- Re: [OMPI users] "An error occurred i... vasilis
- Re: [OMPI users] "An error occur... Eugene Loh
- Re: [OMPI users] "An error occur... George Bosilca
- Re: [OMPI users] "An error o... Damien Hocking
- Re: [OMPI users] "An err... vasilis
- Re: [OMPI users] "An err... vasilis
- Re: [OMPI users] "An error o... Eugene Loh
- Re: [OMPI users] "An err... vasilis
- Re: [OMPI users] "An err... Eugene Loh
- Re: [OMPI users] "An err... vasilis
- Re: [OMPI users] "An err... Eugene Loh
- Re: [OMPI users] "An error o... vasilis