Thanks Ralph for this quick response. In the two attachements you will find the output I got when running the following commands:
[audet@fn1 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleserver 2>&1 | tee server_out.txt [audet@linux15 mpi]$ mpiexec --mca oob_base_verbose 100 -n 1 ./simpleclient '2444427264.0;tcp://172.17.15.20:56377+2444427265.0;tcp://172.17.15.20:34776:300' 2>&1 | tee client_out.txt Martin ________________________________________ From: users [users-boun...@open-mpi.org] On Behalf Of Ralph Castain [r...@open-mpi.org] Sent: Monday, July 13, 2015 5:29 PM To: Open MPI Users Subject: Re: [OMPI users] MPI_Comm_accept() / MPI_Comm_connect() fail between two different machines Try running it with “—mca oob_base_verbose 100” on both client and server - it will tell us why the connection was refused. > On Jul 13, 2015, at 2:14 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> > wrote: > > Hi OMPI_Developers, > > It seems that I am unable to establish an MPI communication between two > independently started MPI programs using the simplest client/server call > sequence I can imagine (see the two attached files) when the client and > server process are started on different machines. Note that I have no > problems when the client and server program run on the same machine. > > For example if I do the following on the server machine (running on fn1): > > [audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver > [audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver > Server port = > '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300' > > The server prints its port (created with MPI_Open_port()) and wait for a > connection by calling MPI_Comm_accept(). > > Now on the client machine (running on linux15) if I compile the client and > run it with the above port address on the command line, I get: > > [audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient > [audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient > '3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300' > trying to connect... > ------------------------------------------------------------ > A process or daemon was unable to complete a TCP connection > to another process: > Local host: linux15 > Remote host: linux15 > This is usually caused by a firewall on the remote host. Please > check that any firewall (e.g., iptables) has been disabled and > try again. > ------------------------------------------------------------ > [linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: > invalid connection state (6) on socket 16 > > And then I have to stop the client program by pressing ^C (and also the > server which doesn't seems affected). > > What's wrong ? > > And I am almost sure there is no firewall running on linux15. > > It is not the first MPI client/server application I am developing (with both > OpenMPI and mpich). > These simple MPI client/server programs work well with mpich (version 3.1.3). > > This problem happens with both OpenMPI 1.8.3 and 1.8.6 > > linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected > by a Gigabit Ethernet (the normal network). > > And again if client and server run on the same machine (either fn1 or > linux15) no such problems happens. > > Thanks in advance, > > Martin > Audet<simpleserver.c><simpleclient.c>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/07/27271.php _______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/07/27272.php
[fn1:07315] mca: base: components_register: registering oob components [fn1:07315] mca: base: components_register: found loaded component tcp [fn1:07315] mca: base: components_register: component tcp register function successful [fn1:07315] mca: base: components_open: opening oob components [fn1:07315] mca: base: components_open: found loaded component tcp [fn1:07315] mca: base: components_open: component tcp open function successful [fn1:07315] mca:oob:select: checking available component tcp [fn1:07315] mca:oob:select: Querying component [tcp] [fn1:07315] oob:tcp: component_available called [fn1:07315] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [fn1:07315] [[37299,0],0] oob:tcp:init rejecting loopback interface lo [fn1:07315] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [fn1:07315] [[37299,0],0] oob:tcp:init adding 172.17.15.20 to our list of V4 connections [fn1:07315] [[37299,0],0] TCP STARTUP [fn1:07315] [[37299,0],0] attempting to bind to IPv4 port 0 [fn1:07315] [[37299,0],0] assigned IPv4 port 56377 [fn1:07315] mca:oob:select: Adding component to end [fn1:07315] mca:oob:select: Found 1 active transports [fn1:07315] [[37299,0],0]: set_addr to uri 2444427264.0;tcp://172.17.15.20:56377 [fn1:07315] [[37299,0],0]:set_addr peer [[37299,0],0] is me [fn1:07319] mca: base: components_register: registering oob components [fn1:07319] mca: base: components_register: found loaded component tcp [fn1:07319] mca: base: components_register: component tcp register function successful [fn1:07319] mca: base: components_open: opening oob components [fn1:07319] mca: base: components_open: found loaded component tcp [fn1:07319] mca: base: components_open: component tcp open function successful [fn1:07319] mca:oob:select: checking available component tcp [fn1:07319] mca:oob:select: Querying component [tcp] [fn1:07319] oob:tcp: component_available called [fn1:07319] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [fn1:07319] [[37299,1],0] oob:tcp:init rejecting loopback interface lo [fn1:07319] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [fn1:07319] [[37299,1],0] oob:tcp:init adding 172.17.15.20 to our list of V4 connections [fn1:07319] [[37299,1],0] TCP STARTUP [fn1:07319] [[37299,1],0] attempting to bind to IPv4 port 0 [fn1:07319] [[37299,1],0] assigned IPv4 port 34776 [fn1:07319] mca:oob:select: Adding component to end [fn1:07319] mca:oob:select: Found 1 active transports [fn1:07319] [[37299,1],0]: set_addr to uri 2444427264.0;tcp://172.17.15.20:56377 [fn1:07319] [[37299,1],0]:set_addr checking if peer [[37299,0],0] is reachable via component tcp [fn1:07319] [[37299,1],0] oob:tcp: working peer [[37299,0],0] address tcp://172.17.15.20:56377 [fn1:07319] [[37299,1],0] PASSING ADDR 172.17.15.20 TO MODULE [fn1:07319] [[37299,1],0]:tcp set addr for peer [[37299,0],0] [fn1:07319] [[37299,1],0]: peer [[37299,0],0] is reachable via component tcp [fn1:07319] [[37299,1],0] OOB_SEND: rml_oob_send.c:199 [fn1:07319] [[37299,1],0]:tcp:processing set_peer cmd [fn1:07319] [[37299,1],0] SET_PEER ADDING PEER [[37299,0],0] [fn1:07319] [[37299,1],0] set_peer: peer [[37299,0],0] is listening on net 172.17.15.20 port 56377 [fn1:07319] [[37299,1],0] oob:base:send to target [[37299,0],0] [fn1:07319] [[37299,1],0] oob:tcp:send_nb to peer [[37299,0],0]:1 [fn1:07319] [[37299,1],0] tcp:send_nb to peer [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:478] post send to [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:415] processing send to peer [[37299,0],0]:1 [fn1:07319] [[37299,1],0]:[oob_tcp.c:449] queue pending to [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_nb: initiating connection to [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:463] connect to [[37299,0],0] [fn1:07319] [[37299,1],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] [fn1:07319] [[37299,1],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] on socket 11 [fn1:07319] [[37299,1],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] on 172.17.15.20:56377 - 0 retries [fn1:07319] [[37299,1],0] waiting for connect completion to [[37299,0],0] - activating send event [fn1:07319] [[37299,1],0] tcp:send_handler called to send to peer [[37299,0],0] [fn1:07315] [[37299,0],0] mca_oob_tcp_listen_thread: new connection: (15, 0) 172.17.15.20:37627 [fn1:07315] [[37299,0],0] connection_handler: working connection (15, 0) 172.17.15.20:37627 [fn1:07315] [[37299,0],0] accept_connection: 172.17.15.20:37627 [fn1:07319] [[37299,1],0] tcp:send_handler CONNECTING [fn1:07315] [[37299,0],0]:tcp:recv:handler called [fn1:07315] [[37299,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 15 [fn1:07315] [[37299,0],0] waiting for connect ack from UNKNOWN [fn1:07315] [[37299,0],0] connect ack received from UNKNOWN [fn1:07315] [[37299,0],0] connect-ack recvd from UNKNOWN [fn1:07315] [[37299,0],0] mca_oob_tcp_recv_connect: connection from new peer [fn1:07315] [[37299,0],0] connect-ack header from [[37299,1],0] is okay [fn1:07315] [[37299,0],0] waiting for connect ack from [[37299,1],0] [fn1:07315] [[37299,0],0] connect ack received from [[37299,1],0] [fn1:07315] [[37299,0],0] connect-ack version from [[37299,1],0] matches ours [fn1:07315] [[37299,0],0] connect-ack [[37299,1],0] authenticated [fn1:07315] [[37299,0],0] tcp:peer_accept called for peer [[37299,1],0] in state UNKNOWN on socket 15 [fn1:07315] [[37299,0],0] SEND CONNECT ACK [fn1:07315] [[37299,0],0] send blocking of 42 bytes to socket 15 [fn1:07315] [[37299,0],0] connect-ack sent to socket 15 [fn1:07315] [[37299,0],0]-[[37299,1],0] tcp_peer_connected on socket 15 [fn1:07315] [[37299,0],0]-[[37299,1],0] accepted: 172.17.15.20 - 172.17.15.20 nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802 [fn1:07319] [[37299,1],0]:tcp:complete_connect called for peer [[37299,0],0] on socket 11 [fn1:07319] [[37299,1],0] tcp_peer_complete_connect: sending ack to [[37299,0],0] [fn1:07319] [[37299,1],0] SEND CONNECT ACK [fn1:07319] [[37299,1],0] send blocking of 42 bytes to socket 11 [fn1:07319] [[37299,1],0] connect-ack sent to socket 11 [fn1:07315] [[37299,0],0] tcp:set_module called for peer [[37299,1],0] [fn1:07319] [[37299,1],0] tcp_peer_complete_connect: setting read event on connection to [[37299,0],0] [fn1:07319] [[37299,1],0]:tcp:recv:handler called for peer [[37299,0],0] [fn1:07319] [[37299,1],0] RECV CONNECT ACK FROM [[37299,0],0] ON SOCKET 11 [fn1:07319] [[37299,1],0] waiting for connect ack from [[37299,0],0] [fn1:07319] [[37299,1],0] connect ack received from [[37299,0],0] [fn1:07319] [[37299,1],0] connect-ack recvd from [[37299,0],0] [fn1:07319] [[37299,1],0] connect-ack header from [[37299,0],0] is okay [fn1:07319] [[37299,1],0] waiting for connect ack from [[37299,0],0] [fn1:07319] [[37299,1],0] connect ack received from [[37299,0],0] [fn1:07319] [[37299,1],0] connect-ack version from [[37299,0],0] matches ours [fn1:07319] [[37299,1],0] connect-ack [[37299,0],0] authenticated [fn1:07319] [[37299,1],0]-[[37299,0],0] tcp_peer_connected on socket 11 [fn1:07319] [[37299,1],0]-[[37299,0],0] connected: 172.17.15.20 - 172.17.15.20 nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802 [fn1:07319] [[37299,1],0]:tcp:recv:handler starting send/recv events [fn1:07319] [[37299,1],0] tcp:set_module called for peer [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_handler called to send to peer [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_handler SENDING TO [[37299,0],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler called for peer [[37299,1],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler CONNECTED [fn1:07315] [[37299,0],0]:tcp:recv:handler allocate new recv msg [fn1:07315] [[37299,0],0]:tcp:recv:handler read hdr [fn1:07315] [[37299,0],0]:tcp:recv:handler allocate data region of size 56 [fn1:07315] [[37299,0],0] RECVD COMPLETE MESSAGE FROM [[37299,1],0] (ORIGIN [[37299,1],0]) OF 56 BYTES FOR DEST [[37299,0],0] TAG 1 [fn1:07315] [[37299,0],0] DELIVERING TO RML [fn1:07315] [[37299,0],0] OOB_SEND: rml_oob_send.c:199 [fn1:07315] [[37299,0],0] oob:base:send to target [[37299,1],0] [fn1:07315] [[37299,0],0] oob:base:send known transport for peer [[37299,1],0] [fn1:07315] [[37299,0],0] oob:tcp:send_nb to peer [[37299,1],0]:20 [fn1:07315] [[37299,0],0] tcp:send_nb to peer [[37299,1],0] [fn1:07315] [[37299,0],0]:[oob_tcp.c:478] post send to [[37299,1],0] [fn1:07315] [[37299,0],0]:[oob_tcp.c:415] processing send to peer [[37299,1],0]:20 [fn1:07315] [[37299,0],0] tcp:send_nb: already connected to [[37299,1],0] - queueing for send [fn1:07315] [[37299,0],0]:[oob_tcp.c:442] queue send to [[37299,1],0] [fn1:07315] [[37299,0],0] tcp:send_handler called to send to peer [[37299,1],0] [fn1:07315] [[37299,0],0] tcp:send_handler SENDING TO [[37299,1],0] [fn1:07315] [[37299,0],0] MESSAGE SEND COMPLETE TO [[37299,1],0] OF 16584 BYTES ON SOCKET 15 [fn1:07319] [[37299,1],0] MESSAGE SEND COMPLETE TO [[37299,0],0] OF 56 BYTES ON SOCKET 11 [fn1:07319] [[37299,1],0]:tcp:recv:handler called for peer [[37299,0],0] [fn1:07319] [[37299,1],0]:tcp:recv:handler CONNECTED [fn1:07319] [[37299,1],0]:tcp:recv:handler allocate new recv msg [fn1:07319] [[37299,1],0]:tcp:recv:handler read hdr [fn1:07319] [[37299,1],0]:tcp:recv:handler allocate data region of size 16584 [fn1:07319] [[37299,1],0] RECVD COMPLETE MESSAGE FROM [[37299,0],0] (ORIGIN [[37299,0],0]) OF 16584 BYTES FOR DEST [[37299,1],0] TAG 20 [fn1:07319] [[37299,1],0] DELIVERING TO RML [fn1:07319] [[37299,1],0] OOB_SEND: rml_oob_send.c:199 [fn1:07319] [[37299,1],0] oob:base:send to target [[37299,0],0] [fn1:07319] [[37299,1],0] oob:base:send known transport for peer [[37299,0],0] [fn1:07319] [[37299,1],0] oob:tcp:send_nb to peer [[37299,0],0]:30 [fn1:07319] [[37299,1],0] tcp:send_nb to peer [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:478] post send to [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:415] processing send to peer [[37299,0],0]:30 [fn1:07319] [[37299,1],0] tcp:send_nb: already connected to [[37299,0],0] - queueing for send [fn1:07319] [[37299,1],0]:[oob_tcp.c:442] queue send to [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_handler called to send to peer [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_handler SENDING TO [[37299,0],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler called for peer [[37299,1],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler CONNECTED [fn1:07315] [[37299,0],0]:tcp:recv:handler allocate new recv msg [fn1:07315] [[37299,0],0]:tcp:recv:handler read hdr [fn1:07315] [[37299,0],0]:tcp:recv:handler allocate data region of size 110 [fn1:07315] [[37299,0],0]:tcp:recv:handler called for peer [[37299,1],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler CONNECTED [fn1:07315] [[37299,0],0] RECVD COMPLETE MESSAGE FROM [[37299,1],0] (ORIGIN [[37299,1],0]) OF 110 BYTES FOR DEST [[37299,0],0] TAG 30 [fn1:07315] [[37299,0],0] DELIVERING TO RML [fn1:07315] [[37299,0],0] OOB_SEND: rml_oob_send.c:199 [fn1:07315] [[37299,0],0] oob:base:send to target [[37299,1],0] [fn1:07315] [[37299,0],0] oob:base:send known transport for peer [[37299,1],0] [fn1:07315] [[37299,0],0] oob:tcp:send_nb to peer [[37299,1],0]:30 [fn1:07315] [[37299,0],0] tcp:send_nb to peer [[37299,1],0] [fn1:07315] [[37299,0],0]:[oob_tcp.c:478] post send to [[37299,1],0] [fn1:07315] [[37299,0],0]:[oob_tcp.c:415] processing send to peer [[37299,1],0]:30 [fn1:07315] [[37299,0],0] tcp:send_nb: already connected to [[37299,1],0] - queueing for send [fn1:07315] [[37299,0],0]:[oob_tcp.c:442] queue send to [[37299,1],0] [fn1:07315] [[37299,0],0] tcp:send_handler called to send to peer [[37299,1],0] [fn1:07315] [[37299,0],0] tcp:send_handler SENDING TO [[37299,1],0] [fn1:07315] [[37299,0],0] MESSAGE SEND COMPLETE TO [[37299,1],0] OF 110 BYTES ON SOCKET 15 [fn1:07319] [[37299,1],0] MESSAGE SEND COMPLETE TO [[37299,0],0] OF 110 BYTES ON SOCKET 11 [fn1:07319] [[37299,1],0]:tcp:recv:handler called for peer [[37299,0],0] [fn1:07319] [[37299,1],0]:tcp:recv:handler CONNECTED [fn1:07319] [[37299,1],0]:tcp:recv:handler allocate new recv msg [fn1:07319] [[37299,1],0]:tcp:recv:handler read hdr [fn1:07319] [[37299,1],0]:tcp:recv:handler allocate data region of size 110 [fn1:07319] [[37299,1],0] RECVD COMPLETE MESSAGE FROM [[37299,0],0] (ORIGIN [[37299,0],0]) OF 110 BYTES FOR DEST [[37299,1],0] TAG 30 [fn1:07319] [[37299,1],0] DELIVERING TO RML [fn1:07319] [[37299,1],0] OOB_SEND: rml_oob_send.c:199 [fn1:07319] [[37299,1],0] oob:base:send to target [[37299,0],0] [fn1:07319] [[37299,1],0] oob:base:send known transport for peer [[37299,0],0] [fn1:07319] [[37299,1],0] oob:tcp:send_nb to peer [[37299,0],0]:30 [fn1:07319] [[37299,1],0] tcp:send_nb to peer [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:478] post send to [[37299,0],0] [fn1:07319] [[37299,1],0]:[oob_tcp.c:415] processing send to peer [[37299,0],0]:30 [fn1:07319] [[37299,1],0] tcp:send_nb: already connected to [[37299,0],0] - queueing for send [fn1:07319] [[37299,1],0]:[oob_tcp.c:442] queue send to [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_handler called to send to peer [[37299,0],0] [fn1:07319] [[37299,1],0] tcp:send_handler SENDING TO [[37299,0],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler called for peer [[37299,1],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler CONNECTED [fn1:07315] [[37299,0],0]:tcp:recv:handler allocate new recv msg [fn1:07315] [[37299,0],0]:tcp:recv:handler read hdr [fn1:07315] [[37299,0],0]:tcp:recv:handler allocate data region of size 8 [fn1:07315] [[37299,0],0]:tcp:recv:handler called for peer [[37299,1],0] [fn1:07315] [[37299,0],0]:tcp:recv:handler CONNECTED [fn1:07315] [[37299,0],0] RECVD COMPLETE MESSAGE FROM [[37299,1],0] (ORIGIN [[37299,1],0]) OF 8 BYTES FOR DEST [[37299,0],0] TAG 30 [fn1:07315] [[37299,0],0] DELIVERING TO RML [fn1:07315] [[37299,0],0] OOB_SEND: rml_oob_send.c:199 [fn1:07315] [[37299,0],0] oob:base:send to target [[37299,1],0] [fn1:07315] [[37299,0],0] oob:base:send known transport for peer [[37299,1],0] [fn1:07315] [[37299,0],0] oob:tcp:send_nb to peer [[37299,1],0]:30 [fn1:07315] [[37299,0],0] tcp:send_nb to peer [[37299,1],0] [fn1:07315] [[37299,0],0]:[oob_tcp.c:478] post send to [[37299,1],0] [fn1:07315] [[37299,0],0]:[oob_tcp.c:415] processing send to peer [[37299,1],0]:30 [fn1:07315] [[37299,0],0] tcp:send_nb: already connected to [[37299,1],0] - queueing for send [fn1:07315] [[37299,0],0]:[oob_tcp.c:442] queue send to [[37299,1],0] [fn1:07315] [[37299,0],0] tcp:send_handler called to send to peer [[37299,1],0] [fn1:07315] [[37299,0],0] tcp:send_handler SENDING TO [[37299,1],0] [fn1:07315] [[37299,0],0] MESSAGE SEND COMPLETE TO [[37299,1],0] OF 8 BYTES ON SOCKET 15 [fn1:07319] [[37299,1],0] MESSAGE SEND COMPLETE TO [[37299,0],0] OF 8 BYTES ON SOCKET 11 [fn1:07319] [[37299,1],0]:tcp:recv:handler called for peer [[37299,0],0] [fn1:07319] [[37299,1],0]:tcp:recv:handler CONNECTED [fn1:07319] [[37299,1],0]:tcp:recv:handler allocate new recv msg [fn1:07319] [[37299,1],0]:tcp:recv:handler read hdr [fn1:07319] [[37299,1],0]:tcp:recv:handler allocate data region of size 8 [fn1:07319] [[37299,1],0] RECVD COMPLETE MESSAGE FROM [[37299,0],0] (ORIGIN [[37299,0],0]) OF 8 BYTES FOR DEST [[37299,1],0] TAG 30 [fn1:07319] [[37299,1],0] DELIVERING TO RML Server port = '2444427264.0;tcp://172.17.15.20:56377+2444427265.0;tcp://172.17.15.20:34776:300' [fn1:07315] [[37299,0],0] mca_oob_tcp_listen_thread: new connection: (16, 11) 172.17.15.37:42658 [fn1:07315] [[37299,0],0] connection_handler: working connection (16, 11) 172.17.15.37:42658 [fn1:07315] [[37299,0],0] accept_connection: 172.17.15.37:42658 [fn1:07315] [[37299,0],0]:tcp:recv:handler called [fn1:07315] [[37299,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 16 [fn1:07315] [[37299,0],0] waiting for connect ack from UNKNOWN [fn1:07315] [[37299,0],0]-UNKNOWN tcp_peer_recv_blocking: peer closed connection: peer state 0 [fn1:07315] [[37299,0],0] unable to complete recv of connect-ack from UNKNOWN ON SOCKET 16
[linux15:24779] mca: base: components_register: registering oob components [linux15:24779] mca: base: components_register: found loaded component tcp [linux15:24779] mca: base: components_register: component tcp register function successful [linux15:24779] mca: base: components_open: opening oob components [linux15:24779] mca: base: components_open: found loaded component tcp [linux15:24779] mca: base: components_open: component tcp open function successful [linux15:24779] mca:oob:select: checking available component tcp [linux15:24779] mca:oob:select: Querying component [tcp] [linux15:24779] oob:tcp: component_available called [linux15:24779] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [linux15:24779] [[3417,0],0] oob:tcp:init rejecting loopback interface lo [linux15:24779] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [linux15:24779] [[3417,0],0] oob:tcp:init adding 172.17.15.37 to our list of V4 connections [linux15:24779] [[3417,0],0] TCP STARTUP [linux15:24779] [[3417,0],0] attempting to bind to IPv4 port 0 [linux15:24779] [[3417,0],0] assigned IPv4 port 56467 [linux15:24779] mca:oob:select: Adding component to end [linux15:24779] mca:oob:select: Found 1 active transports [linux15:24779] [[3417,0],0]: set_addr to uri 223936512.0;tcp://172.17.15.37:56467 [linux15:24779] [[3417,0],0]:set_addr peer [[3417,0],0] is me [linux15:24782] mca: base: components_register: registering oob components [linux15:24782] mca: base: components_register: found loaded component tcp [linux15:24782] mca: base: components_register: component tcp register function successful [linux15:24782] mca: base: components_open: opening oob components [linux15:24782] mca: base: components_open: found loaded component tcp [linux15:24782] mca: base: components_open: component tcp open function successful [linux15:24782] mca:oob:select: checking available component tcp [linux15:24782] mca:oob:select: Querying component [tcp] [linux15:24782] oob:tcp: component_available called [linux15:24782] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 [linux15:24782] [[3417,1],0] oob:tcp:init rejecting loopback interface lo [linux15:24782] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4 [linux15:24782] [[3417,1],0] oob:tcp:init adding 172.17.15.37 to our list of V4 connections [linux15:24782] [[3417,1],0] TCP STARTUP [linux15:24782] [[3417,1],0] attempting to bind to IPv4 port 0 [linux15:24782] [[3417,1],0] assigned IPv4 port 55370 [linux15:24782] mca:oob:select: Adding component to end [linux15:24782] mca:oob:select: Found 1 active transports [linux15:24782] [[3417,1],0]: set_addr to uri 223936512.0;tcp://172.17.15.37:56467 [linux15:24782] [[3417,1],0]:set_addr checking if peer [[3417,0],0] is reachable via component tcp [linux15:24782] [[3417,1],0] oob:tcp: working peer [[3417,0],0] address tcp://172.17.15.37:56467 [linux15:24782] [[3417,1],0] PASSING ADDR 172.17.15.37 TO MODULE [linux15:24782] [[3417,1],0]:tcp set addr for peer [[3417,0],0] [linux15:24782] [[3417,1],0]: peer [[3417,0],0] is reachable via component tcp [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0]:tcp:processing set_peer cmd [linux15:24782] [[3417,1],0] SET_PEER ADDING PEER [[3417,0],0] [linux15:24782] [[3417,1],0] set_peer: peer [[3417,0],0] is listening on net 172.17.15.37 port 56467 [linux15:24782] [[3417,1],0] oob:base:send to target [[3417,0],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[3417,0],0]:1 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[3417,0],0]:1 [linux15:24782] [[3417,1],0]:[oob_tcp.c:449] queue pending to [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_nb: initiating connection to [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:463] connect to [[3417,0],0] [linux15:24782] [[3417,1],0] orte_tcp_peer_try_connect: attempting to connect to proc [[3417,0],0] [linux15:24782] [[3417,1],0] orte_tcp_peer_try_connect: attempting to connect to proc [[3417,0],0] on socket 11 [linux15:24782] [[3417,1],0] orte_tcp_peer_try_connect: attempting to connect to proc [[3417,0],0] on 172.17.15.37:56467 - 0 retries [linux15:24782] [[3417,1],0] waiting for connect completion to [[3417,0],0] - activating send event [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler CONNECTING [linux15:24782] [[3417,1],0]:tcp:complete_connect called for peer [[3417,0],0] on socket 11 [linux15:24782] [[3417,1],0] tcp_peer_complete_connect: sending ack to [[3417,0],0] [linux15:24782] [[3417,1],0] SEND CONNECT ACK [linux15:24782] [[3417,1],0] send blocking of 42 bytes to socket 11 [linux15:24782] [[3417,1],0] connect-ack sent to socket 11 [linux15:24782] [[3417,1],0] tcp_peer_complete_connect: setting read event on connection to [[3417,0],0] [linux15:24779] [[3417,0],0] mca_oob_tcp_listen_thread: new connection: (15, 0) 172.17.15.37:44898 [linux15:24779] [[3417,0],0] connection_handler: working connection (15, 0) 172.17.15.37:44898 [linux15:24779] [[3417,0],0] accept_connection: 172.17.15.37:44898 [linux15:24779] [[3417,0],0]:tcp:recv:handler called [linux15:24779] [[3417,0],0] RECV CONNECT ACK FROM UNKNOWN ON SOCKET 15 [linux15:24779] [[3417,0],0] waiting for connect ack from UNKNOWN [linux15:24779] [[3417,0],0] connect ack received from UNKNOWN [linux15:24779] [[3417,0],0] connect-ack recvd from UNKNOWN [linux15:24779] [[3417,0],0] mca_oob_tcp_recv_connect: connection from new peer [linux15:24779] [[3417,0],0] connect-ack header from [[3417,1],0] is okay [linux15:24779] [[3417,0],0] waiting for connect ack from [[3417,1],0] [linux15:24779] [[3417,0],0] connect ack received from [[3417,1],0] [linux15:24779] [[3417,0],0] connect-ack version from [[3417,1],0] matches ours [linux15:24779] [[3417,0],0] connect-ack [[3417,1],0] authenticated [linux15:24779] [[3417,0],0] tcp:peer_accept called for peer [[3417,1],0] in state UNKNOWN on socket 15 [linux15:24779] [[3417,0],0] SEND CONNECT ACK [linux15:24779] [[3417,0],0] send blocking of 42 bytes to socket 15 [linux15:24779] [[3417,0],0] connect-ack sent to socket 15 [linux15:24779] [[3417,0],0]-[[3417,1],0] tcp_peer_connected on socket 15 [linux15:24779] [[3417,0],0]-[[3417,1],0] accepted: 172.17.15.37 - 172.17.15.37 nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802 [linux15:24779] [[3417,0],0] tcp:set_module called for peer [[3417,1],0] [linux15:24782] [[3417,1],0]:tcp:recv:handler called for peer [[3417,0],0] [linux15:24782] [[3417,1],0] RECV CONNECT ACK FROM [[3417,0],0] ON SOCKET 11 [linux15:24782] [[3417,1],0] waiting for connect ack from [[3417,0],0] [linux15:24782] [[3417,1],0] connect ack received from [[3417,0],0] [linux15:24782] [[3417,1],0] connect-ack recvd from [[3417,0],0] [linux15:24782] [[3417,1],0] connect-ack header from [[3417,0],0] is okay [linux15:24782] [[3417,1],0] waiting for connect ack from [[3417,0],0] [linux15:24782] [[3417,1],0] connect ack received from [[3417,0],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate data region of size 55 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 55 BYTES FOR DEST [[3417,0],0] TAG 1 [linux15:24779] [[3417,0],0] DELIVERING TO RML [linux15:24779] [[3417,0],0] OOB_SEND: rml_oob_send.c:199 [linux15:24779] [[3417,0],0] oob:base:send to target [[3417,1],0] [linux15:24779] [[3417,0],0] oob:base:send known transport for peer [[3417,1],0] [linux15:24779] [[3417,0],0] oob:tcp:send_nb to peer [[3417,1],0]:20 [linux15:24779] [[3417,0],0] tcp:send_nb to peer [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:478] post send to [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:415] processing send to peer [[3417,1],0]:20 [linux15:24779] [[3417,0],0] tcp:send_nb: already connected to [[3417,1],0] - queueing for send [linux15:24779] [[3417,0],0]:[oob_tcp.c:442] queue send to [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler called to send to peer [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler SENDING TO [[3417,1],0] [linux15:24779] [[3417,0],0] MESSAGE SEND COMPLETE TO [[3417,1],0] OF 7577 BYTES ON SOCKET 15 [linux15:24782] [[3417,1],0] connect-ack version from [[3417,0],0] matches ours [linux15:24782] [[3417,1],0] connect-ack [[3417,0],0] authenticated [linux15:24782] [[3417,1],0]-[[3417,0],0] tcp_peer_connected on socket 11 [linux15:24782] [[3417,1],0]-[[3417,0],0] connected: 172.17.15.37 - 172.17.15.37 nodelay 1 sndbuf 262142 rcvbuf 262142 flags 00000802 [linux15:24782] [[3417,1],0]:tcp:recv:handler starting send/recv events [linux15:24782] [[3417,1],0] tcp:set_module called for peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 55 BYTES ON SOCKET 11 [linux15:24782] [[3417,1],0]:tcp:recv:handler called for peer [[3417,0],0] [linux15:24782] [[3417,1],0]:tcp:recv:handler CONNECTED [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate new recv msg [linux15:24782] [[3417,1],0]:tcp:recv:handler read hdr [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate data region of size 7577 [linux15:24782] [[3417,1],0] RECVD COMPLETE MESSAGE FROM [[3417,0],0] (ORIGIN [[3417,0],0]) OF 7577 BYTES FOR DEST [[3417,1],0] TAG 20 [linux15:24782] [[3417,1],0] DELIVERING TO RML [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0] oob:base:send to target [[3417,0],0] [linux15:24782] [[3417,1],0] oob:base:send known transport for peer [[3417,0],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[3417,0],0]:30 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[3417,0],0]:30 [linux15:24782] [[3417,1],0] tcp:send_nb: already connected to [[3417,0],0] - queueing for send [linux15:24782] [[3417,1],0]:[oob_tcp.c:442] queue send to [[3417,0],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate data region of size 110 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 110 BYTES FOR DEST [[3417,0],0] TAG 30 [linux15:24779] [[3417,0],0] DELIVERING TO RML [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24779] [[3417,0],0] OOB_SEND: rml_oob_send.c:199 [linux15:24779] [[3417,0],0] oob:base:send to target [[3417,1],0] [linux15:24779] [[3417,0],0] oob:base:send known transport for peer [[3417,1],0] [linux15:24779] [[3417,0],0] oob:tcp:send_nb to peer [[3417,1],0]:30 [linux15:24779] [[3417,0],0] tcp:send_nb to peer [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:478] post send to [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:415] processing send to peer [[3417,1],0]:30 [linux15:24779] [[3417,0],0] tcp:send_nb: already connected to [[3417,1],0] - queueing for send [linux15:24779] [[3417,0],0]:[oob_tcp.c:442] queue send to [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler called to send to peer [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler SENDING TO [[3417,1],0] [linux15:24779] [[3417,0],0] MESSAGE SEND COMPLETE TO [[3417,1],0] OF 110 BYTES ON SOCKET 15 [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 110 BYTES ON SOCKET 11 [linux15:24782] [[3417,1],0]:tcp:recv:handler called for peer [[3417,0],0] [linux15:24782] [[3417,1],0]:tcp:recv:handler CONNECTED [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate new recv msg [linux15:24782] [[3417,1],0]:tcp:recv:handler read hdr [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate data region of size 110 [linux15:24782] [[3417,1],0] RECVD COMPLETE MESSAGE FROM [[3417,0],0] (ORIGIN [[3417,0],0]) OF 110 BYTES FOR DEST [[3417,1],0] TAG 30 [linux15:24782] [[3417,1],0] DELIVERING TO RML [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate data region of size 8 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 8 BYTES FOR DEST [[3417,0],0] TAG 30 [linux15:24779] [[3417,0],0] DELIVERING TO RML [linux15:24779] [[3417,0],0] OOB_SEND: rml_oob_send.c:199 [linux15:24779] [[3417,0],0] oob:base:send to target [[3417,1],0] [linux15:24779] [[3417,0],0] oob:base:send known transport for peer [[3417,1],0] [linux15:24779] [[3417,0],0] oob:tcp:send_nb to peer [[3417,1],0]:30 [linux15:24779] [[3417,0],0] tcp:send_nb to peer [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:478] post send to [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:415] processing send to peer [[3417,1],0]:30 [linux15:24779] [[3417,0],0] tcp:send_nb: already connected to [[3417,1],0] - queueing for send [linux15:24779] [[3417,0],0]:[oob_tcp.c:442] queue send to [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler called to send to peer [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler SENDING TO [[3417,1],0] [linux15:24779] [[3417,0],0] MESSAGE SEND COMPLETE TO [[3417,1],0] OF 8 BYTES ON SOCKET 15 [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0] oob:base:send to target [[3417,0],0] [linux15:24782] [[3417,1],0] oob:base:send known transport for peer [[3417,0],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[3417,0],0]:30 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[3417,0],0]:30 [linux15:24782] [[3417,1],0] tcp:send_nb: already connected to [[3417,0],0] - queueing for send [linux15:24782] [[3417,1],0]:[oob_tcp.c:442] queue send to [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 8 BYTES ON SOCKET 11 [linux15:24782] [[3417,1],0]:tcp:recv:handler called for peer [[3417,0],0] [linux15:24782] [[3417,1],0]:tcp:recv:handler CONNECTED [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate new recv msg [linux15:24782] [[3417,1],0]:tcp:recv:handler read hdr [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate data region of size 8 [linux15:24782] [[3417,1],0] RECVD COMPLETE MESSAGE FROM [[3417,0],0] (ORIGIN [[3417,0],0]) OF 8 BYTES FOR DEST [[3417,1],0] TAG 30 [linux15:24782] [[3417,1],0] DELIVERING TO RML [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate data region of size 51 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 51 BYTES FOR DEST [[3417,0],0] TAG 9 [linux15:24779] [[3417,0],0] DELIVERING TO RML [linux15:24779] [[3417,0],0]: set_addr to uri 2444427264.0;tcp://172.17.15.20:56377 [linux15:24779] [[3417,0],0]:set_addr checking if peer [[37299,0],0] is reachable via component tcp [linux15:24779] [[3417,0],0] oob:tcp: working peer [[37299,0],0] address tcp://172.17.15.20:56377 [linux15:24779] [[3417,0],0] PASSING ADDR 172.17.15.20 TO MODULE [linux15:24779] [[3417,0],0]:tcp set addr for peer [[37299,0],0] [linux15:24779] [[3417,0],0]: peer [[37299,0],0] is reachable via component tcp [linux15:24779] [[3417,0],0] OOB_SEND: rml_oob_send.c:199 [linux15:24779] [[3417,0],0]:tcp:processing set_peer cmd [linux15:24779] [[3417,0],0] SET_PEER ADDING PEER [[37299,0],0] [linux15:24779] [[3417,0],0] set_peer: peer [[37299,0],0] is listening on net 172.17.15.20 port 56377 [linux15:24779] [[3417,0],0] oob:base:send to target [[3417,1],0] [linux15:24779] [[3417,0],0] oob:base:send known transport for peer [[3417,1],0] [linux15:24779] [[3417,0],0] oob:tcp:send_nb to peer [[3417,1],0]:19 [linux15:24779] [[3417,0],0] tcp:send_nb to peer [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:478] post send to [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:415] processing send to peer [[3417,1],0]:19 [linux15:24779] [[3417,0],0] tcp:send_nb: already connected to [[3417,1],0] - queueing for send [linux15:24779] [[3417,0],0]:[oob_tcp.c:442] queue send to [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler called to send to peer [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler SENDING TO [[3417,1],0] [linux15:24779] [[3417,0],0] MESSAGE SEND COMPLETE TO [[3417,1],0] OF 0 BYTES ON SOCKET 15 [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0] oob:base:send to target [[3417,0],0] [linux15:24782] [[3417,1],0] oob:base:send known transport for peer [[3417,0],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[3417,0],0]:9 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[3417,0],0]:9 [linux15:24782] [[3417,1],0] tcp:send_nb: already connected to [[3417,0],0] - queueing for send [linux15:24782] [[3417,1],0]:[oob_tcp.c:442] queue send to [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 51 BYTES ON SOCKET 11 trying to connect... [linux15:24782] [[3417,1],0]:tcp:recv:handler called for peer [[3417,0],0] [linux15:24782] [[3417,1],0]:tcp:recv:handler CONNECTED [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate new recv msg [linux15:24782] [[3417,1],0]:tcp:recv:handler read hdr [linux15:24782] [[3417,1],0] RECVD ZERO-BYTE MESSAGE FROM [[3417,0],0] for tag 19 [linux15:24782] [[3417,1],0] RECVD COMPLETE MESSAGE FROM [[3417,0],0] (ORIGIN [[3417,0],0]) OF 0 BYTES FOR DEST [[3417,1],0] TAG 19 [linux15:24782] [[3417,1],0] DELIVERING TO RML [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0] RECVD ZERO-BYTE MESSAGE FROM [[3417,1],0] for tag 33 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 0 BYTES FOR DEST [[3417,0],0] TAG 33 [linux15:24779] [[3417,0],0] DELIVERING TO RML [linux15:24779] [[3417,0],0] OOB_SEND: rml_oob_send.c:199 [linux15:24779] [[3417,0],0] oob:base:send to target [[3417,1],0] [linux15:24779] [[3417,0],0] oob:base:send known transport for peer [[3417,1],0] [linux15:24779] [[3417,0],0] oob:tcp:send_nb to peer [[3417,1],0]:31 [linux15:24779] [[3417,0],0] tcp:send_nb to peer [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:478] post send to [[3417,1],0] [linux15:24779] [[3417,0],0]:[oob_tcp.c:415] processing send to peer [[3417,1],0]:31 [linux15:24779] [[3417,0],0] tcp:send_nb: already connected to [[3417,1],0] - queueing for send [linux15:24779] [[3417,0],0]:[oob_tcp.c:442] queue send to [[3417,1],0] [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0] oob:base:send to target [[3417,0],0] [linux15:24782] [[3417,1],0] oob:base:send known transport for peer [[3417,0],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[3417,0],0]:33 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[3417,0],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[3417,0],0]:33 [linux15:24782] [[3417,1],0] tcp:send_nb: already connected to [[3417,0],0] - queueing for send [linux15:24782] [[3417,1],0]:[oob_tcp.c:442] queue send to [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 0 BYTES ON SOCKET 11 [linux15:24779] [[3417,0],0] tcp:send_handler called to send to peer [[3417,1],0] [linux15:24779] [[3417,0],0] tcp:send_handler SENDING TO [[3417,1],0] [linux15:24779] [[3417,0],0] MESSAGE SEND COMPLETE TO [[3417,1],0] OF 8 BYTES ON SOCKET 15 [linux15:24782] [[3417,1],0]:tcp:recv:handler called for peer [[3417,0],0] [linux15:24782] [[3417,1],0]:tcp:recv:handler CONNECTED [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate new recv msg [linux15:24782] [[3417,1],0]:tcp:recv:handler read hdr [linux15:24782] [[3417,1],0]:tcp:recv:handler allocate data region of size 8 [linux15:24782] [[3417,1],0] RECVD COMPLETE MESSAGE FROM [[3417,0],0] (ORIGIN [[3417,0],0]) OF 8 BYTES FOR DEST [[3417,1],0] TAG 31 [linux15:24782] [[3417,1],0] DELIVERING TO RML [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate data region of size 8 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 8 BYTES FOR DEST [[37299,1],0] TAG 300 [linux15:24779] [[3417,0],0] ROUTING TO [[37299,0],0] FROM HERE [linux15:24779] [[3417,0],0]:[oob_tcp_sendrecv.c:579] queue relay to [[37299,0],0] [linux15:24779] [[3417,0],0]:[oob_tcp_sendrecv.c:579] connect to [[37299,0],0] [linux15:24779] [[3417,0],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] [linux15:24779] [[3417,0],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] on socket 16 [linux15:24779] [[3417,0],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] on 172.17.15.20:56377 - 0 retries [linux15:24779] [[3417,0],0] waiting for connect completion to [[37299,0],0] - activating send event [linux15:24779] [[3417,0],0]:tcp:recv:handler called for peer [[3417,1],0] [linux15:24779] [[3417,0],0]:tcp:recv:handler CONNECTED [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate new recv msg [linux15:24779] [[3417,0],0]:tcp:recv:handler read hdr [linux15:24779] [[3417,0],0]:tcp:recv:handler allocate data region of size 98 [linux15:24779] [[3417,0],0] RECVD COMPLETE MESSAGE FROM [[3417,1],0] (ORIGIN [[3417,1],0]) OF 98 BYTES FOR DEST [[37299,1],0] TAG 300 [linux15:24779] [[3417,0],0] ROUTING TO [[37299,0],0] FROM HERE [linux15:24779] [[3417,0],0]:[oob_tcp_sendrecv.c:579] queue relay to [[37299,0],0] [linux15:24779] [[3417,0],0]:[oob_tcp_sendrecv.c:579] connect to [[37299,0],0] [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0] OOB_SEND: rml_oob_send.c:199 [linux15:24782] [[3417,1],0] oob:base:send to target [[37299,1],0] [linux15:24782] [[3417,1],0] oob:base:send unknown peer [[37299,1],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[37299,1],0]:300 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[37299,1],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[37299,1],0] [linux15:24782] [[3417,1],0] oob:base:send to target [[37299,1],0] [linux15:24782] [[3417,1],0] oob:base:send known transport for peer [[37299,1],0] [linux15:24782] [[3417,1],0] oob:tcp:send_nb to peer [[37299,1],0]:300 [linux15:24782] [[3417,1],0] tcp:send_nb to peer [[37299,1],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:478] post send to [[37299,1],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[37299,1],0]:300 [linux15:24782] [[3417,1],0] tcp:send_nb: already connected to [[3417,0],0] - queueing for send [linux15:24782] [[3417,1],0]:[oob_tcp.c:442] queue send to [[37299,1],0] [linux15:24782] [[3417,1],0]:[oob_tcp.c:415] processing send to peer [[37299,1],0]:300 [linux15:24782] [[3417,1],0] tcp:send_nb: already connected to [[3417,0],0] - queueing for send [linux15:24782] [[3417,1],0]:[oob_tcp.c:442] queue send to [[37299,1],0] [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 8 BYTES ON SOCKET 11 [linux15:24782] [[3417,1],0] tcp:send_handler called to send to peer [[3417,0],0] [linux15:24782] [[3417,1],0] tcp:send_handler SENDING TO [[3417,0],0] [linux15:24782] [[3417,1],0] MESSAGE SEND COMPLETE TO [[3417,0],0] OF 98 BYTES ON SOCKET 11 [linux15:24779] [[3417,0],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] [linux15:24779] [[3417,0],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] on socket 16 [linux15:24779] [[3417,0],0] orte_tcp_peer_try_connect: attempting to connect to proc [[37299,0],0] on 172.17.15.20:56377 - 1 retries ------------------------------------------------------------ A process or daemon was unable to complete a TCP connection to another process: Local host: linux15 Remote host: linux15 This is usually caused by a firewall on the remote host. Please check that any firewall (e.g., iptables) has been disabled and try again. ------------------------------------------------------------ [linux15:24779] [[3417,0],0] tcp:failed_to_connect called for peer [[37299,0],0] [linux15:24779] [[3417,0],0] tcp:failed_to_connect unable to reach peer [[37299,0],0] [linux15:24779] [[3417,0],0] tcp:send_handler called to send to peer [[37299,0],0] [linux15:24779] [[3417,0],0]-[[37299,0],0] mca_oob_tcp_peer_send_handler: invalid connection state (6) on socket 16