My $0.02:

- building under your $HOME is recommended in cases like this, but it's not 
going to change the functionality of how OMPI works.  I.e., rebuilding under 
your $HOME will likely not change the result.

- you have 3 MPI implementations telling you that TCP connections between your 
VMs doesn't work (OMPI 1.8.x, OMPI 1.6.x, MPICH).  It's therefore likely that 
there's some kind of TCP blocking going on between your VMs.

- +1 on what Ralph says: you likely need to talk to your sysadmin to get this 
problem resolved.  iptables and other firewalls may be disabled on your VMs, 
but OpenStack (or other sysadmin/network admin-controlled entities) *between* 
your two VMs may be effecting fireballing policies.  This is quite common in 
cloud environments.

- You can use any TCP ping-pong test to verify TCP connectivity between VMs -- 
i.e., programs that use random TCP ports to communicate; not the "usual" 
suspects of ports that OpenStack may leave open by default (22, 80, 443, 
...etc.).



> On Mar 28, 2015, at 7:22 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote:
> 
> I 'll recompile it on the home directory to see how it works.
> 
> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
> [r...@open-mpi.org]
> Sent: 28 March 2015 23:13
> To: Open MPI Users
> Subject: Re: [OMPI users] Connection problem on Linux cluster
> 
> Doug is correct, and we usually suggest you build it under your own home 
> directory to make it easier to cleanup at a later time.
> 
> Only thing I can suggest is talking to the sys admin some more about TCP 
> connections between VMs on OpenStack and getting their help. Something is 
> obviously blocking communications, but it is likely something only they can 
> identify. Clouds tend to be finicky in that regard.
> 
> You could also try the standard network diagnostics to see if TCP is capable 
> of getting thru.
> 
> 
>> On Mar 28, 2015, at 4:00 PM, Douglas L Reeder <d...@centurylink.net> wrote:
>> 
>> Building as root is a bad idea. Try building it as a regular user, using 
>> sudo make install if necessary.
>> 
>> Doug Reeder
>> On Mar 28, 2015, at 4:53 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote:
>> 
>>> when you said --debug-enable is not activated, I installed it again to make 
>>> sure. I have only one mpi installed on all VMs. 
>>> 
>>> FYI: I have just tried mpich to see how does it works. it freezes for few 
>>> minutes then comes back with the error complaining about the firewall!!!! 
>>> By the way,  I have already have firewall disabled and iptable is set to 
>>> allow all connections. I checked with system admin and there is no other 
>>> firewall between the nodes.
>>> 
>>> here is the output of what you are asked:
>>> 
>>> ubuntu@fehg-node-0:~$ which mpirun 
>>> /usr/local/openmpi/bin/mpirun
>>> ubuntu@fehg-node-0:~$ ompi_info
>>>                  Package: Open MPI ubuntu@fehg-node-0 Distribution
>>>                 Open MPI: 1.6.5
>>>    Open MPI SVN revision: r28673
>>>    Open MPI release date: Jun 26, 2013
>>>                 Open RTE: 1.6.5
>>>    Open RTE SVN revision: r28673
>>>    Open RTE release date: Jun 26, 2013
>>>                     OPAL: 1.6.5
>>>        OPAL SVN revision: r28673
>>>        OPAL release date: Jun 26, 2013
>>>                  MPI API: 2.1
>>>             Ident string: 1.6.5
>>>                   Prefix: /usr/local/openmpi
>>>  Configured architecture: i686-pc-linux-gnu
>>>           Configure host: fehg-node-0
>>>            Configured by: ubuntu
>>>            Configured on: Sat Mar 28 20:19:28 UTC 2015
>>>           Configure host: fehg-node-0
>>>                 Built by: root
>>>                 Built on: Sat Mar 28 20:30:18 UTC 2015
>>>               Built host: fehg-node-0
>>>               C bindings: yes
>>>             C++ bindings: yes
>>>       Fortran77 bindings: no
>>>       Fortran90 bindings: no
>>>  Fortran90 bindings size: na
>>>               C compiler: gcc
>>>      C compiler absolute: /usr/bin/gcc
>>>   C compiler family name: GNU
>>>       C compiler version: 4.6.3
>>>             C++ compiler: g++
>>>    C++ compiler absolute: /usr/bin/g++
>>>       Fortran77 compiler: none
>>>   Fortran77 compiler abs: none
>>>       Fortran90 compiler: none
>>>   Fortran90 compiler abs: none
>>>              C profiling: yes
>>>            C++ profiling: yes
>>>      Fortran77 profiling: no
>>>      Fortran90 profiling: no
>>>           C++ exceptions: no
>>>           Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>>>            Sparse Groups: no
>>>   Internal debug support: yes
>>>   MPI interface warnings: no
>>>      MPI parameter check: runtime
>>> Memory profiling support: no
>>> Memory debugging support: no
>>>          libltdl support: yes
>>>    Heterogeneous support: no
>>>  mpirun default --prefix: no
>>>          MPI I/O support: yes
>>>        MPI_WTIME support: gettimeofday
>>>      Symbol vis. support: yes
>>>    Host topology support: yes
>>>           MPI extensions: affinity example
>>>    FT Checkpoint support: no (checkpoint thread: no)
>>>      VampirTrace support: yes
>>>   MPI_MAX_PROCESSOR_NAME: 256
>>>     MPI_MAX_ERROR_STRING: 256
>>>      MPI_MAX_OBJECT_NAME: 64
>>>         MPI_MAX_INFO_KEY: 36
>>>         MPI_MAX_INFO_VAL: 256
>>>        MPI_MAX_PORT_NAME: 1024
>>>   MPI_MAX_DATAREP_STRING: 128
>>>            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.5)
>>>            MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA carto: file (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.5)
>>>            MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.5)
>>>            MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.5)
>>>          MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.5)
>>>          MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.5)
>>>              MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.5)
>>>            MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.5)
>>>            MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: self (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.5)
>>>                   MCA io: romio (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA pml: v (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.5)
>>>                 MCA odls: default (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA rmaps: load_balance (MCA v2.0, API v2.0, Component 
>>> v1.6.5)
>>>                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.5)
>>>               MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5)
>>>                  MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.5)
>>>              MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.5)
>>>              MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.5)
>>>              MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.5)
>>>             MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.5)
>>>             MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.5)
>>> 
>>> 
>>> Regards,
>>> Karos
>>> 
>>> 
>>> 
>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>> [r...@open-mpi.org]
>>> Sent: 28 March 2015 22:04
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>> 
>>> Something is clearly wrong. Most likely, you are not pointing at the OMPI 
>>> install that you think you are - or you didn’t really configure it 
>>> properly. Check the path by running “which mpirun” and ensure you are 
>>> executing the one you expected. If so, then run “ompi_info” to see how it 
>>> was configured and sent it to us.
>>> 
>>> 
>>>> On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>> wrote:
>>>> 
>>>> surprisingly,  it is all that I get!! nothing else come after.  This is 
>>>> the same for openmpi-1.6.5.
>>>> 
>>>> 
>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>> [r...@open-mpi.org]
>>>> Sent: 28 March 2015 20:12
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>> 
>>>> Did you configure —enable-debug? We aren’t seeing any of the debug output, 
>>>> so I suspect not.
>>>> 
>>>> 
>>>>> On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>> wrote:
>>>>> 
>>>>> I have done it and it is the results:
>>>>> 
>>>>> ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 
>>>>> -mca state_base_verbose 10 hostname
>>>>> [fehg-node-0:30034] mca: base: components_open: Looking for oob components
>>>>> [fehg-node-0:30034] mca: base: components_open: opening oob components
>>>>> [fehg-node-0:30034] mca: base: components_open: found loaded component tcp
>>>>> [fehg-node-0:30034] mca: base: components_open: component tcp register 
>>>>> function successful
>>>>> [fehg-node-0:30034] mca: base: components_open: component tcp open 
>>>>> function successful
>>>>> [fehg-node-7:31138] mca: base: components_open: Looking for oob components
>>>>> [fehg-node-7:31138] mca: base: components_open: opening oob components
>>>>> [fehg-node-7:31138] mca: base: components_open: found loaded component tcp
>>>>> [fehg-node-7:31138] mca: base: components_open: component tcp register 
>>>>> function successful
>>>>> [fehg-node-7:31138] mca: base: components_open: component tcp open 
>>>>> function successful
>>>>> 
>>>>> freeze ...
>>>>> 
>>>>> Regards
>>>>> 
>>>>> From: users [users-boun...@open-mpi.org] on behalf of LOTFIFAR F. 
>>>>> [foad.lotfi...@durham.ac.uk]
>>>>> Sent: 28 March 2015 18:49
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>> 
>>>>> fehg_node_1 and fehg-node-7 are the same. it is just a typo. 
>>>>> 
>>>>> Correction: VM names are fehg-node-0 and fehg-node-7.
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>>> [r...@open-mpi.org]
>>>>> Sent: 28 March 2015 18:23
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>> 
>>>>> Just to be clear: do you have two physical nodes? Or just one physical 
>>>>> node and you are running two VMs on it?
>>>>> 
>>>>>> On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>>> wrote:
>>>>>> 
>>>>>> I have a floating IP for accessing nodes from outside of the cluster and 
>>>>>> internal ip addresses. I tried to run the jobs with both of them (both 
>>>>>> ip addresses) but it makes no difference. 
>>>>>> I have just installed openmpi 1.6.5 to see how does this version works. 
>>>>>> In this case I get nothing and I have to press Crtl+c. not output or 
>>>>>> error is shown.
>>>>>> 
>>>>>> 
>>>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>>>> [r...@open-mpi.org]
>>>>>> Sent: 28 March 2015 17:03
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>>> 
>>>>>> You mentioned running this in a VM - is that IP address correct for 
>>>>>> getting across the VMs?
>>>>>> 
>>>>>> 
>>>>>>> On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi , 
>>>>>>> 
>>>>>>> I am wondering how can I solve this problem. 
>>>>>>> System Spec:
>>>>>>> 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 
>>>>>>> LTS 32bit.
>>>>>>> 2- openmpi 1.8.4
>>>>>>> 
>>>>>>> I do a simple test running on fehg_node_0:
>>>>>>> > mpirun -host fehg_node_0,fehg_node_1 hello_world -mca 
>>>>>>> > oob_base_verbose 20
>>>>>>> 
>>>>>>> and I get the following error:
>>>>>>> 
>>>>>>> A process or daemon was unable to complete a TCP connection
>>>>>>> to another process:
>>>>>>>   Local host:    fehg-node-0
>>>>>>>   Remote host:   10.104.5.40
>>>>>>> This is usually caused by a firewall on the remote host. Please
>>>>>>> check that any firewall (e.g., iptables) has been disabled and
>>>>>>> try again.
>>>>>>> ------------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> ORTE was unable to reliably start one or more daemons.
>>>>>>> This usually is caused by:
>>>>>>> 
>>>>>>> * not finding the required libraries and/or binaries on
>>>>>>>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>>>>>>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>>>>>>> 
>>>>>>> * lack of authority to execute on one or more specified nodes.
>>>>>>>   Please verify your allocation and authorities.
>>>>>>> 
>>>>>>> * the inability to write startup files into /tmp 
>>>>>>> (--tmpdir/orte_tmpdir_base).
>>>>>>>   Please check with your sys admin to determine the correct location to 
>>>>>>> use.
>>>>>>> 
>>>>>>> *  compilation of the orted with dynamic libraries when static are 
>>>>>>> required
>>>>>>>   (e.g., on Cray). Please check your configure cmd line and consider 
>>>>>>> using
>>>>>>>   one of the contrib/platform definitions for your system type.
>>>>>>> 
>>>>>>> * an inability to create a connection back to mpirun due to a
>>>>>>>   lack of common network interfaces and/or no route found between
>>>>>>>   them. Please check network connectivity (including firewalls
>>>>>>>   and network routing requirements).
>>>>>>> 
>>>>>>> Verbose:
>>>>>>> 1- I have full access to the VMs on the cluster and setup everything 
>>>>>>> myself
>>>>>>> 2- Firewall and iptables are all disabled on the nodes
>>>>>>> 3- nodes can ssh to each other with  no problem
>>>>>>> 4- non-interactive bash calls works fine i.e. when I run ssh othernode 
>>>>>>> env | grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set 
>>>>>>> correctly
>>>>>>> 5- I have checked the posts, a similar problem reported for Solaris but 
>>>>>>> I could not find a clue about mine. 
>>>>>>> 6- run with --enable-orterun-prefix-by-default does not make any 
>>>>>>> changes.
>>>>>>> 7-  I see orte is running on the other node when I check processes, but 
>>>>>>> nothing happens after that and the error happens.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Karos
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26555.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26557.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26562.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/03/26564.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/03/26566.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26567.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26569.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to