Short update: I just installed version 1.4.4 from source (compiled with --enable-mpi-threads), and the problem persists.
I should also point out that if, in thread (ii), I wait for the nonblocking communication in thread (i) to finish, nothing bad happens. But this makes the nonblocking communication somewhat pointless. Cheers, Pedro On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote: > Hi all, > > I am currently working on a multi-threaded hybrid parallel simulation > which uses both pthreads and OpenMPI. The simulation uses several > pthreads per MPI node. > > My code uses the nonblocking routines MPI_Isend/MPI_Irecv/MPI_Waitany > quite successfully to implement the node-to-node communication. When I > try to interleave other computations during this communication, however, > bad things happen. > > I have two MPI nodes with two threads each: one thread (i) doing the > nonblocking communication and the other (ii) doing other computations. > At some point, the threads (ii) need to exchange data using > MPI_Allreduce, which fails if the first thread (i) has not completed all > the communication, i.e. if thread (i) is still in MPI_Waitany. > > Using the in-place MPI_Allreduce, I get a re-run of this bug: > http://www.open-mpi.org/community/lists/users/2011/09/17432.php. If I > don't use in-place, the call to MPI_Waitany (thread ii) on one of the > MPI nodes waits forever. > > My guess is that when the thread (ii) calls MPI_Allreduce, it gets > whatever the other node sent with MPI_Isend to thread (i), drops > whatever it should have been getting from the other node's > MPI_Allreduce, and the call to MPI_Waitall hangs. > > Is this a known issue? Is MPI_Allreduce not designed to work alongside > the nonblocking routines? Is there a "safe" variant of MPI_Allreduce I > should be using instead? > > I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the package > openmpi-bin in Ubuntu). Both MPI nodes are run on the same dual-core > computer (Lenovo x201 laptop). > > If you need more information, please do let me know! I'll also try to > cook-up a small program reproducing this problem... > > Cheers and kind regards, > Pedro > > > >