Actually, I don’t see any related changes in OMPI master, let alone the 
branches. So far as I can tell, the author never actually submitted the work.


> On Oct 19, 2017, at 3:57 PM, Mukkie <mukunthh...@gmail.com> wrote:
> 
> FWIW, my issue is related to this one.
> https://github.com/open-mpi/ompi/issues/1585 
> <https://github.com/open-mpi/ompi/issues/1585>
> 
> I have version 3.0.0 and the above issue is closed saying, fixes went into 
> 3.1.0
> However, i don't see the code changes towards this issue.?
> 
> Cordially,
> Muku.
> 
> On Wed, Oct 18, 2017 at 3:52 PM, Mukkie <mukunthh...@gmail.com 
> <mailto:mukunthh...@gmail.com>> wrote:
> Thanks for your suggestion. However my firewall's are already disabled on 
> both the machines.
> 
> Cordially,
> Muku. 
> 
> On Wed, Oct 18, 2017 at 2:38 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
> Looks like there is a firewall or something blocking communication between 
> those nodes?
> 
>> On Oct 18, 2017, at 1:29 PM, Mukkie <mukunthh...@gmail.com 
>> <mailto:mukunthh...@gmail.com>> wrote:
>> 
>> Adding a verbose output. Please check for failed and advise. Thank you.
>> 
>> [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca oob_base_verbose 
>> 100 --mca btl tcp,self ring_c
>> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open 
>> mca_plm_tm: libtorque.so.2: cannot open shared object file: No such file or 
>> directory (ignored)
>> [ipv-rhel73:10575] mca: base: components_register: registering framework oob 
>> components
>> [ipv-rhel73:10575] mca: base: components_register: found loaded component tcp
>> [ipv-rhel73:10575] mca: base: components_register: component tcp register 
>> function successful
>> [ipv-rhel73:10575] mca: base: components_open: opening oob components
>> [ipv-rhel73:10575] mca: base: components_open: found loaded component tcp
>> [ipv-rhel73:10575] mca: base: components_open: component tcp open function 
>> successful
>> [ipv-rhel73:10575] mca:oob:select: checking available component tcp
>> [ipv-rhel73:10575] mca:oob:select: Querying component [tcp]
>> [ipv-rhel73:10575] oob:tcp: component_available called
>> [ipv-rhel73:10575] WORKING INTERFACE 1 KERNEL INDEX 2 FAMILY: V6
>> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init adding 
>> fe80::b9b:ac5d:9cf0:b858 to our list of V6 connections
>> [ipv-rhel73:10575] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4
>> [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init rejecting loopback interface lo
>> [ipv-rhel73:10575] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>> [ipv-rhel73:10575] [[20058,0],0] TCP STARTUP
>> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv4 port 0
>> [ipv-rhel73:10575] [[20058,0],0] assigned IPv4 port 53438
>> [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv6 port 0
>> [ipv-rhel73:10575] [[20058,0],0] assigned IPv6 port 43370
>> [ipv-rhel73:10575] mca:oob:select: Adding component to end
>> [ipv-rhel73:10575] mca:oob:select: Found 1 active transports
>> [ipv-rhel73:10575] [[20058,0],0]: get transports
>> [ipv-rhel73:10575] [[20058,0],0]:get transports for component tcp
>> [ipv-rhel73:10575] mca_base_component_repository_open: unable to open 
>> mca_ras_tm: libtorque.so.2: cannot open shared object file: No such file or 
>> directory (ignored)
>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: 
>> registering framework oob components
>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: found 
>> loaded component tcp
>> [ipv-rhel71a.locallab.local:12299] mca: base: components_register: component 
>> tcp register function successful
>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: opening oob 
>> components
>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: found loaded 
>> component tcp
>> [ipv-rhel71a.locallab.local:12299] mca: base: components_open: component tcp 
>> open function successful
>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: checking available 
>> component tcp
>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Querying component [tcp]
>> [ipv-rhel71a.locallab.local:12299] oob:tcp: component_available called
>> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 1 KERNEL INDEX 2 
>> FAMILY: V6
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init adding 
>> fe80::226:b9ff:fe85:6a28 to our list of V6 connections
>> [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 2 KERNEL INDEX 1 
>> FAMILY: V4
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init rejecting 
>> loopback interface lo
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] TCP STARTUP
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to IPv4 
>> port 0
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv4 port 50782
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to IPv6 
>> port 0
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv6 port 59268
>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Adding component to end
>> [ipv-rhel71a.locallab.local:12299] mca:oob:select: Found 1 active transports
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: get transports
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:get transports for 
>> component tcp
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: set_addr to uri 
>> 1314521088.0;tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 <>
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:set_addr checking if peer 
>> [[20058,0],0] is reachable via component tcp
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp: working peer 
>> [[20058,0],0] address tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 <>
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SET_PEER ADDING PEER 
>> [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] set_peer: peer 
>> [[20058,0],0] is listening on net fe80::b9b:ac5d:9cf0:b858 port 43370
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: peer [[20058,0],0] is 
>> reachable via component tcp
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] OOB_SEND: rml_oob_send.c:265
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:base:send to target 
>> [[20058,0],0] - attempt 0
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:send_nb to peer 
>> [[20058,0],0]:10 seq = -1
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:204] processing 
>> send to peer [[20058,0],0]:10 seq_num = -1 via [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:225] queue 
>> pending to [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:send_nb: initiating 
>> connection to [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:239] connect to 
>> [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> attempting to connect to proc [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> attempting to connect to proc [[20058,0],0] on socket 20
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> attempting to connect to proc [[20058,0],0] on (null):-1 - 0 retries
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> Connection to proc [[20058,0],0] succeeded
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 bytes 
>> to socket 20
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_send_blocking: 
>> send() to socket 20 failed: Broken pipe (32)
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_close for 
>> [[20058,0],0] sd 20 state FAILED
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp_connection.c:356] 
>> connect to [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:lost connection called 
>> for peer [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> attempting to connect to proc [[20058,0],0]
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> attempting to connect to proc [[20058,0],0] on socket 20
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> attempting to connect to proc [[20058,0],0] on (null):-1 - 0 retries
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: 
>> Connection to proc [[20058,0],0] succeeded
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK
>> [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 bytes 
>> to socket 20
>> --------------------------------------------------------------------------
>> ORTE was unable to reliably start one or more daemons.
>> This usually is caused by:
>> 
>> * not finding the required libraries and/or binaries on
>>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>> 
>> * lack of authority to execute on one or more specified nodes.
>>   Please verify your allocation and authorities.
>> 
>> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>>   Please check with your sys admin to determine the correct location to use.
>> 
>> *  compilation of the orted with dynamic libraries when static are required
>>   (e.g., on Cray). Please check your configure cmd line and consider using
>>   one of the contrib/platform definitions for your system type.
>> 
>> * an inability to create a connection back to mpirun due to a
>>   lack of common network interfaces and/or no route found between
>>   them. Please check network connectivity (including firewalls
>>   and network routing requirements).
>> --------------------------------------------------------------------------
>> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN
>> [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN done
>> [ipv-rhel73:10575] mca: base: close: component tcp closed
>> [ipv-rhel73:10575] mca: base: close: unloading component tcp
>> 
>> Cordially,
>> Muku.
>> 
>> 
>> On Wed, Oct 18, 2017 at 11:18 AM, Mukkie <mukunthh...@gmail.com 
>> <mailto:mukunthh...@gmail.com>> wrote:
>> Hi,
>> 
>> I have two ipv6 only machines, I configured/built OMPI version 3.0 with - 
>> -enable-ipv6
>> 
>> I want to verify a simple MPI communication call through tcp ip between 
>> these two machines. I am using ring_c and connectivity_c examples.
>> 
>>  
>> Issuing from one of the host machine…
>> 
>> [mselvam@ipv-rhel73 examples]$  mpirun -hostfile host --mca btl tcp,self 
>> --mca oob_base_verbose 100 ring_c
>> 
>> .
>> . 
>> 
>> [ipv-rhel71a.locallab.local:10822] [[5331,0],1] tcp_peer_send_blocking: 
>> send() to socket 20 failed: Broken pipe (32)
>> 
>> 
>> 
>> where “host” contains the ipv6 address of the remote machine (namely – 
>> ‘ipv-rhel71a’). Also I have passwordless ssh setup to the remote machine. 
>> 
>>  
>> I will attach a verbose output in the follow-up post.
>> 
>> Thanks.
>> 
>>  
>> Cordially,
>> 
>>  
>> 
>> Mukundhan Selvam
>> 
>> Development Engineer, HPC
>> 
>>  <http://www.mscsoftware.com/>
>> 4675 MacArthur Court, Newport Beach, CA 92660
>> 
>> 714-540-8900 <tel:(714)%20540-8900> ext. 4166
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://lists.open-mpi.org/mailman/listinfo/users 
>> <https://lists.open-mpi.org/mailman/listinfo/users>
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://lists.open-mpi.org/mailman/listinfo/users 
> <https://lists.open-mpi.org/mailman/listinfo/users>
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to