I see the problem - it’s a race condition, actually. I’ll try to provide a 
patch for you to test, if you don’t mind.


> On Jul 13, 2015, at 3:03 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
> wrote:
> 
> Thanks Ralph for this quick response.
> 
> In the two attachements you will find the output I got when running the 
> following commands:
> 
> [audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleserver 2>&1 
> | tee server_out.txt
> 
> [audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleclient 
> '2444427264.0;tcp://172.17.15.20:56377+2444427265.0;tcp://172.17.15.20:34776:300'
>  2>&1 | tee client_out.txt
> 
> Martin
> ________________________________________
> From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain 
> [r...@open-mpi.org]
> Sent: Monday, July 13, 2015 5:29 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail   
> between two different machines
> 
> Try running it with “—mca oob_base_verbose 100” on both client and server - 
> it will tell us why the connection was refused.
> 
> 
>> On Jul 13, 2015, at 2:14 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> 
>> wrote:
>> 
>> Hi OMPI_Developers,
>> 
>> It seems that I am unable to establish an MPI communication between two 
>> independently started MPI programs using the simplest client/server call 
>> sequence I can imagine (see the two attached files) when the client and 
>> server process are started on different machines. Note that I have no 
>> problems when the client and server program run on the same machine.
>> 
>> For example if I do the following on the server machine (running on fn1):
>> 
>> [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
>> [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
>> Server port = 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> 
>> The server prints its port (created with MPI_Open_port()) and wait for a 
>> connection by calling MPI_Comm_accept().
>> 
>> Now on the client machine (running on linux15) if I compile the client and 
>> run it with the above port address on the command line, I get:
>> 
>> [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
>> [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
>> '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
>> trying to connect...
>> ------------------------------------------------------------
>> A process or daemon was unable to complete a TCP connection
>> to another process:
>> Local host:    linux15
>> Remote host:   linux15
>> This is usually caused by a firewall on the remote host. Please
>> check that any firewall (e.g., iptables) has been disabled and
>> try again.
>> ------------------------------------------------------------
>> [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
>> invalid connection state (6) on socket 16
>> 
>> And then I have to stop the client program by pressing ^C (and also the 
>> server which doesn't seems affected).
>> 
>> What's wrong ?
>> 
>> And I am almost sure there is no firewall running on linux15.
>> 
>> It is not the first MPI client/server application I am developing (with both 
>> OpenMPI and mpich).
>> These simple MPI client/server programs work well with mpich (version 3.1.3).
>> 
>> This problem happens with both OpenMPI 1.8.3 and 1.8.6
>> 
>> linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected 
>> by a Gigabit Ethernet (the normal network).
>> 
>> And again if client and server run on the same machine (either fn1 or 
>> linux15) no such problems happens.
>> 
>> Thanks in advance,
>> 
>> Martin 
>> Audet<simpleserver.c><simpleclient.c>_______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/07/27271.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27272.php
> <server_out.txt><client_out.txt>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27273.php

Reply via email to