Hi Jefff

No firewall is enabled. Running the diagnostics I found that non
communication mpi job is running . While ring_c remains stuck. There
are of course warnings for open fabrics but in my case I an running
application by disabling openib., Please see below

 [pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 hello_c.out
--------------------------------------------------------------------------
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them).  This
is most certainly not what you wanted.  Check your cables, subnet
manager configuration, etc.  The openib BTL will be ignored for this
job.
  Local host: compute-01-01.private.dns.zone
--------------------------------------------------------------------------
Hello, world, I am 0 of 2
Hello, world, I am 1 of 2
[pmd.pakmet.com:06386] 1 more process has sent help message
help-mpi-btl-openib.txt / no active ports found
[pmd.pakmet.com:06386] Set MCA parameter "orte_base_help_aggregate" to
0 to see all help / error messages

[pmdtest@pmd ~]$ mpirun --host compute-01-01,compute-01-06 ring_c
--------------------------------------------------------------------------
WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them).  This
is most certainly not what you wanted.  Check your cables, subnet
manager configuration, etc.  The openib BTL will be ignored for this
job.
  Local host: compute-01-01.private.dns.zone
--------------------------------------------------------------------------
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
[compute-01-01.private.dns.zone][[54687,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.108.10 failed: No route to host (113)
[pmd.pakmet.com:15965] 1 more process has sent help message
help-mpi-btl-openib.txt / no active ports found
[pmd.pakmet.com:15965] Set MCA parameter "orte_base_help_aggregate" to
0 to see all help / error messages
<span class="sewh9wyhn1gq30p"><br></span>





On Wed, Nov 12, 2014 at 7:32 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:
> Do you have firewalling enabled on either server?
>
> See this FAQ item:
>
>     http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems
>
>
>
> On Nov 12, 2014, at 4:57 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote:
>
>> Dear All
>>
>> I need your advice. While trying to run mpirun job across nodes I get
>> following error. It seems that the two nodes i.e, compute-01-01 and
>> compute-01-06 are not able to communicate with each other. While nodes
>> see each other on ping.
>>
>> [pmdtest@pmd ERA_CLM45]$ mpirun -np 16 -hostfile hostlist --mca btl
>> ^openib ../bin/regcmMPICLM45 regcm.in
>>
>> [compute-01-06.private.dns.zone][[48897,1],7][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 192.168.108.14 failed: No route to host (113)
>> [compute-01-06.private.dns.zone][[48897,1],4][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 192.168.108.14 failed: No route to host (113)
>> [compute-01-06.private.dns.zone][[48897,1],5][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 192.168.108.14 failed: No route to host (113)
>> [compute-01-01.private.dns.zone][[48897,1],10][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>> [compute-01-01.private.dns.zone][[48897,1],12][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 192.168.108.10 failed: No route to host (113)
>> [compute-01-01.private.dns.zone][[48897,1],14][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
>> connect() to 192.168.108.10 failed: No route to host (113)
>> connect() to 192.168.108.10 failed: No route to host (113)
>>
>> mpirun: killing job...
>>
>> [pmdtest@pmd ERA_CLM45]$ ssh compute-01-01
>> Last login: Wed Nov 12 09:48:53 2014 from pmd-eth0.private.dns.zone
>> [pmdtest@compute-01-01 ~]$ ping compute-01-06
>> PING compute-01-06.private.dns.zone (10.0.0.8) 56(84) bytes of data.
>> 64 bytes from compute-01-06.private.dns.zone (10.0.0.8): icmp_seq=1
>> ttl=64 time=0.108 ms
>> 64 bytes from compute-01-06.private.dns.zone (10.0.0.8): icmp_seq=2
>> ttl=64 time=0.088 ms
>>
>> --- compute-01-06.private.dns.zone ping statistics ---
>> 2 packets transmitted, 2 received, 0% packet loss, time 999ms
>> rtt min/avg/max/mdev = 0.088/0.098/0.108/0.010 ms
>> [pmdtest@compute-01-01 ~]$
>>
>> Thanks in advance.
>>
>> Ahsan
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25761.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25763.php



-- 
Syed Ahsan Ali Bokhari
Electronic Engineer (EE)

Research & Development Division
Pakistan Meteorological Department H-8/4, Islamabad.
Phone # off  +92518358714
Cell # +923155145014

Reply via email to