Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Ralph Castain
One thing you might want to keep in mind is that “non-blocking” doesn’t mean “asynchronous progress”. The API may not block, but the communications only progress whenever you actually call down into the library. So if you are calling a non-blocking collective, and then make additional calls int

Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Felipe .
>Try and do a variable amount of work for every process, I see non-blocking >as a way to speed-up communication if they arrive individually to the call. >Please always have this at the back of your mind when doing this. I tried to simplify the problem at the explanation. The "local_computation" is

Re: [OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Nick Papior
Try and do a variable amount of work for every process, I see non-blocking as a way to speed-up communication if they arrive individually to the call. Please always have this at the back of your mind when doing this. Surely non-blocking has overhead, and if the communication time is low, so will t

[OMPI users] MPI_AllReduce vs MPI_IAllReduce

2015-11-27 Thread Felipe .
Hello! I have a program that basically is (first implementation): for i in N: local_computation(i) mpi_allreduce(in_place, i) In order to try to mitigate the implicit barrier of the mpi_allreduce, I tried to start an mpi_Iallreduce. Like this(second implementation): for i in N: local_comput

Re: [OMPI users] oob-tcp error (warning?) message

2015-11-27 Thread Ralph Castain
I know there was an issue with Torque and cpusets at one time, but I believe that has been fixed (likely later in 1.10 series). Regardless, the error message you are seeing indicates the failure to open a socket between daemons on different nodes. Could be hitting a file descriptor limit, or it

Re: [OMPI users] oob-tcp error (warning?) message

2015-11-27 Thread Grigory Shamov
Hi Ralph, Thanks for the reply! I have tried, but couldn't get 1.8.8 or 1.10 (tried 1.10.0 back then) to work with our pretty old Torque 2.5.13 with cpusets . Under some circumstances (process/node layout as given by Torque), it fails to bind cores with messages like: Error message: hwloc_s