I see the problem - it’s a race condition, actually. I’ll try to provide a patch for you to test, if you don’t mind.
> On Jul 13, 2015, at 3:03 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> > wrote: > > Thanks Ralph for this quick response. > > In the two attachements you will find the output I got when running the > following commands: > > [audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleserver 2>&1 > | tee server_out.txt > > [audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleclient > '2444427264.0;tcp://172.17.15.20:56377+2444427265.0;tcp://172.17.15.20:34776:300' > 2>&1 | tee client_out.txt > > Martin > ________________________________________ > From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain > [r...@open-mpi.org] > Sent: Monday, July 13, 2015 5:29 PM > To: Open MPI Users > Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail > between two different machines > > Try running it with “—mca oob_base_verbose 100” on both client and server - > it will tell us why the connection was refused. > > >> On Jul 13, 2015, at 2:14 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> >> wrote: >> >> Hi OMPI_Developers, >> >> It seems that I am unable to establish an MPI communication between two >> independently started MPI programs using the simplest client/server call >> sequence I can imagine (see the two attached files) when the client and >> server process are started on different machines. Note that I have no >> problems when the client and server program run on the same machine. >> >> For example if I do the following on the server machine (running on fn1): >> >> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver >> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver >> Server port = >> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300' >> >> The server prints its port (created with MPI_Open_port()) and wait for a >> connection by calling MPI_Comm_accept(). >> >> Now on the client machine (running on linux15) if I compile the client and >> run it with the above port address on the command line, I get: >> >> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient >> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient >> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300' >> trying to connect... >> ------------------------------------------------------------ >> A process or daemon was unable to complete a TCP connection >> to another process: >> Local host: linux15 >> Remote host: linux15 >> This is usually caused by a firewall on the remote host. Please >> check that any firewall (e.g., iptables) has been disabled and >> try again. >> ------------------------------------------------------------ >> [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: >> invalid connection state (6) on socket 16 >> >> And then I have to stop the client program by pressing ^C (and also the >> server which doesn't seems affected). >> >> What's wrong ? >> >> And I am almost sure there is no firewall running on linux15. >> >> It is not the first MPI client/server application I am developing (with both >> OpenMPI and mpich). >> These simple MPI client/server programs work well with mpich (version 3.1.3). >> >> This problem happens with both OpenMPI 1.8.3 and 1.8.6 >> >> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected >> by a Gigabit Ethernet (the normal network). >> >> And again if client and server run on the same machine (either fn1 or >> linux15) no such problems happens. >> >> Thanks in advance, >> >> Martin >> Audet<simpleserver.c><simpleclient.c>_______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/07/27271.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/07/27272.php > <server_out.txt><client_out.txt>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/07/27273.php