Same result in both cases

[pmdtest@pmd ~]$ mpirun --mca btl ^openib --host
compute-01-01,compute-01-06 ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
connect() to failed: No route to host (113)

[pmdtest@compute-01-01 ~]$ mpirun --mca btl ^openib --host
compute-01-01,compute-01-06 ring_c
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
connect() to failed: No route to host (113)

On Thu, Nov 13, 2014 at 12:11 PM, Gilles Gouaillardet
<> wrote:
> Hi,
> it seems you messed up the command line
> could you try
> $ mpirun --mca btl ^openib --host compute-01-01,compute-01-06 ring_c
> can you also try to run mpirun from a compute node instead of the head
> node ?
> Cheers,
> Gilles
> On 2014/11/13 16:07, Syed Ahsan Ali wrote:
>> Here is what I see when disabling openib support.\
>> [pmdtest@pmd ~]$ mpirun --host --mca btl ^openib
>> compute-01-01,compute-01-06 ring_c
>> ssh:  orted: Temporary failure in name resolution
>> ssh:  orted: Temporary failure in name resolution
>> --------------------------------------------------------------------------
>> A daemon (pid 7608) died unexpectedly with status 255 while attempting
>> to launch so we are aborting.
>> While nodes can still ssh each other
>> [pmdtest@compute-01-01 ~]$ ssh compute-01-06
>> Last login: Thu Nov 13 12:05:58 2014 from
>> [pmdtest@compute-01-06 ~]$
>> On Thu, Nov 13, 2014 at 12:03 PM, Syed Ahsan Ali <> 
>> wrote:
>>>  Hi Jefff
>>> No firewall is enabled. Running the diagnostics I found that non
>>> communication mpi job is running . While ring_c remains stuck. There
>>> are of course warnings for open fabrics but in my case I an running
>>> application by disabling openib., Please see below
>>>  [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 hello_c.out
>>> --------------------------------------------------------------------------
>>> WARNING: There is at least one OpenFabrics device found but there are
>>> no active ports detected (or Open MPI was unable to use them).  This
>>> is most certainly not what you wanted.  Check your cables, subnet
>>> manager configuration, etc.  The openib BTL will be ignored for this
>>> job.
>>>   Local host:
>>> --------------------------------------------------------------------------
>>> Hello, world, I am 0 of 2
>>> Hello, world, I am 1 of 2
>>> [] 1 more process has sent help message
>>> help-mpi-btl-openib.txt / no active ports found
>>> [] Set MCA parameter "orte_base_help_aggregate" to
>>> 0 to see all help / error messages
>>> [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 ring_c
>>> --------------------------------------------------------------------------
>>> WARNING: There is at least one OpenFabrics device found but there are
>>> no active ports detected (or Open MPI was unable to use them).  This
>>> is most certainly not what you wanted.  Check your cables, subnet
>>> manager configuration, etc.  The openib BTL will be ignored for this
>>> job.
>>>   Local host:
>>> --------------------------------------------------------------------------
>>> Process 0 sending 10 to 1, tag 201 (2 processes in ring)
>>> Process 0 sent to 1
>>> Process 0 decremented value: 9
>>> [][[54687,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>> connect() to failed: No route to host (113)
>>> [] 1 more process has sent help message
>>> help-mpi-btl-openib.txt / no active ports found
>>> [] Set MCA parameter "orte_base_help_aggregate" to
>>> 0 to see all help / error messages
>>> <span class="sewh9wyhn1gq30p"><br></span>
>>> On Wed, Nov 12, 2014 at 7:32 PM, Jeff Squyres (jsquyres)
>>> <> wrote:
>>>> Do you have firewalling enabled on either server?
>>>> See this FAQ item:
>>>> On Nov 12, 2014, at 4:57 AM, Syed Ahsan Ali <> wrote:
>>>>> Dear All
>>>>> I need your advice. While trying to run mpirun job across nodes I get
>>>>> following error. It seems that the two nodes i.e, compute-01-01 and
>>>>> compute-01-06 are not able to communicate with each other. While nodes
>>>>> see each other on ping.
>>>>> [pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostlist --mca btl
>>>>> ^openib ../bin/regcmMPICLM45
>>>>> [][[48897,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>>>> connect() to failed: No route to host (113)
>>>>> [][[48897,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>>>> connect() to failed: No route to host (113)
>>>>> [][[48897,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>>>> connect() to failed: No route to host (113)
>>>>> [][[48897,1],10][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>>>> [][[48897,1],12][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>>>> connect() to failed: No route to host (113)
>>>>> [][[48897,1],14][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>>>>> connect() to failed: No route to host (113)
>>>>> connect() to failed: No route to host (113)
>>>>> mpirun: killing job...
>>>>> [pmdtest@pmd ERA_CLM45]$ ssh compute-01-01
>>>>> Last login: Wed Nov 12 09:48:53 2014 from
>>>>> [pmdtest@compute-01-01 ~]$ ping compute-01-06
>>>>> PING ( 56(84) bytes of data.
>>>>> 64 bytes from ( icmp_seq=1
>>>>> ttl=64 time=0.108 ms
>>>>> 64 bytes from ( icmp_seq=2
>>>>> ttl=64 time=0.088 ms
>>>>> --- ping statistics ---
>>>>> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
>>>>> rtt min/avg/max/mdev = 0.088/0.098/0.108/0.010 ms
>>>>> [pmdtest@compute-01-01 ~]$
>>>>> Thanks in advance.
>>>>> Ahsan
>>>>> _______________________________________________

Reply via email to