Thank you very much guys. The problem has just been resolved. The problem was 
in the security groups rules when one create VMs. Openstack pushes the security 
groups into iptables rules and it is not necessary to do anything with iptables 
or firewalls inside VMs. The processes were freezing and I could not get any 
further debug information. Actually when I left the processes for few hours, I 
could finally get the error for tcp connection. I think I could fine tune 
openmpi to reduce this time? 

Thank you very much for you help and information.

Karos

________________________________________
From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: 29 March 2015 17:14
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

The port range param differs between the two releases you cited. For the 1.8 
release and the OMPI master, the correct MCA param is:

oob_tcp_dynamic_ipv4_ports <range>

Or you can specify the actual, specific ports you want us to use:

oob_tcp_static_ipv4_ports <comma-separated list of ports>

Note that this only controls the “listening” port - the side initiating the 
connection gets its port from the OS. If we could see what it is doing, then 
perhaps that would tell us more about the source of the trouble (i.e., maybe 
the OS doesn’t realize it cannot assign ports outside the domain specified in 
your security group?). You might have to edit the OS table to tell it to 
restrict its range of assignable ports, if you haven’t already done so.

Frankly, I’m more disturbed by the inability to get any debug information out 
of the code. When enabled and with verbosity set, we should get printouts 
telling us what ports are being tried and why they are failing. The fact that 
we don’t get *anything* tells me that something we don’t understand is 
interfering with our debug efforts.

The debug output would be going to stderr - is there anything that would divert 
that, or block it?


> On Mar 29, 2015, at 5:31 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote:
>
> Yes, I have tried installing in home directory which made no difference. You 
> are right Ralph, last night I noticed the same problem. When I launch VMs in 
> openstack web interface, I should assign the VM to a security group. If I do 
> not, Openstack automatically assignes it to a default security group. In this 
> case all traffic is blocked in/out VMs. This is why the traffic is blocked. 
> To sort this out, I have defined my own security group with the following 
> rules:
>
> Ingress       -       TCP     2222            0.0.0.0/0 (CIDR)
> Ingress       -       TCP     22 (SSH)        0.0.0.0/0 (CIDR)
> Ingress       -       TCP     3389 (RDP)      0.0.0.0/0 (CIDR)
> Ingress       -       ICMP    -1 (ALL ICMP)   0.0.0.0/0 (CIDR)
> Ingress       -       TCP     23              0.0.0.0/0 (CIDR)
> Ingress       -       TCP     8080            0.0.0.0/0 (CIDR)
> Ingress       -       TCP     5900            0.0.0.0/0 (CIDR)
> Ingress       -       TCP     111             0.0.0.0/0 (CIDR)
> Ingress       -       UDP     111             0.0.0.0/0 (CIDR)
> Ingress       -       TCP     2049            0.0.0.0/0 (CIDR)
> Ingress       -       UDP     2049            0.0.0.0/0 (CIDR)
> Ingress       -       TCP     1 - 6000        0.0.0.0/0 (CIDR)
> Ingress       -       UDP     1 - 6000        0.0.0.0/0 (CIDR)
>
> the last item opens all TCP ports between 1-6000. I have both iptables and 
> firewall disabled on VMs. running nmap from fehg-node-7 gives the following 
> outputs for ports below/higher than 6000:
>
>> nmap -A fehg-node-0 -p "any port higher than 6000 i.e. 7000"
>
> Nmap scan report for fehg-node-0
> Host is up (0.00062s latency).
> PORT      STATE    SERVICE          VERSION
> 7000/tcp filtered snet-sensor-mgmt
>
> And:
>> nmap -A fehg-node-0 -p "any port less than 6000 i.e. 5000"
> Nmap scan report for fehg-node-0
> Host is up (0.00077s latency).
> PORT     STATE  SERVICE VERSION
> 5000/tcp closed upnp
>
> The first one is correct while I do not know why the second one is not open. 
> I tried to define iptables to allow all traffic from/to VMs in the cluster 
> but it still says the ports are closed!.!!!! which I do not know why????
>
> Of course just opening ports 1-6000 is not a good idea and I just wanted to 
> see the issue.
>
> PS: Another question is about the ports that openmpi uses for mpi 
> communications. As I tried to limit the port range between 1-6000 using mca 
> parameters:
>
> oob_tcp_port_min_v4 = (My minimum port in the range)
> oob_tcp_port_range_v4 = (My port range)
> btl_tcp_port_min_v4 = (My minimum port in the range)
> btl_tcp_port_range_v4 = (My port range)
>
> I noticed that openmpi does not pay attention to these parameters i.e. there 
> is no difference using them? it still uses tcp ports out of the range.
>
> Regards,
> Karos
>
> ________________________________________
> From: users [users-boun...@open-mpi.org] on behalf of Jeff Squyres (jsquyres) 
> [jsquy...@cisco.com]
> Sent: 29 March 2015 11:56
> To: Open MPI User's List
> Subject: Re: [OMPI users] Connection problem on Linux cluster
>
> My $0.02:
>
> - building under your $HOME is recommended in cases like this, but it's not 
> going to change the functionality of how OMPI works.  I.e., rebuilding under 
> your $HOME will likely not change the result.
>
> - you have 3 MPI implementations telling you that TCP connections between 
> your VMs doesn't work (OMPI 1.8.x, OMPI 1.6.x, MPICH).  It's therefore likely 
> that there's some kind of TCP blocking going on between your VMs.
>
> - +1 on what Ralph says: you likely need to talk to your sysadmin to get this 
> problem resolved.  iptables and other firewalls may be disabled on your VMs, 
> but OpenStack (or other sysadmin/network admin-controlled entities) *between* 
> your two VMs may be effecting fireballing policies.  This is quite common in 
> cloud environments.
>
> - You can use any TCP ping-pong test to verify TCP connectivity between VMs 
> -- i.e., programs that use random TCP ports to communicate; not the "usual" 
> suspects of ports that OpenStack may leave open by default (22, 80, 443, 
> ...etc.).
>
>
>
>> On Mar 28, 2015, at 7:22 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote:
>>
>> I 'll recompile it on the home directory to see how it works.
>>
>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>> [r...@open-mpi.org]
>> Sent: 28 March 2015 23:13
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>
>> Doug is correct, and we usually suggest you build it under your own home 
>> directory to make it easier to cleanup at a later time.
>>
>> Only thing I can suggest is talking to the sys admin some more about TCP 
>> connections between VMs on OpenStack and getting their help. Something is 
>> obviously blocking communications, but it is likely something only they can 
>> identify. Clouds tend to be finicky in that regard.
>>
>> You could also try the standard network diagnostics to see if TCP is capable 
>> of getting thru.
>>
>>
>>> On Mar 28, 2015, at 4:00 PM, Douglas L Reeder <d...@centurylink.net> wrote:
>>>
>>> Building as root is a bad idea. Try building it as a regular user, using 
>>> sudo make install if necessary.
>>>
>>> Doug Reeder
>>> On Mar 28, 2015, at 4:53 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote:
>>>
>>>> when you said --debug-enable is not activated, I installed it again to 
>>>> make sure. I have only one mpi installed on all VMs.
>>>>
>>>> FYI: I have just tried mpich to see how does it works. it freezes for few 
>>>> minutes then comes back with the error complaining about the firewall!!!! 
>>>> By the way,  I have already have firewall disabled and iptable is set to 
>>>> allow all connections. I checked with system admin and there is no other 
>>>> firewall between the nodes.
>>>>
>>>> here is the output of what you are asked:
>>>>
>>>> ubuntu@fehg-node-0:~$ which mpirun
>>>> /usr/local/openmpi/bin/mpirun
>>>> ubuntu@fehg-node-0:~$ ompi_info
>>>>                 Package: Open MPI ubuntu@fehg-node-0 Distribution
>>>>                Open MPI: 1.6.5
>>>>   Open MPI SVN revision: r28673
>>>>   Open MPI release date: Jun 26, 2013
>>>>                Open RTE: 1.6.5
>>>>   Open RTE SVN revision: r28673
>>>>   Open RTE release date: Jun 26, 2013
>>>>                    OPAL: 1.6.5
>>>>       OPAL SVN revision: r28673
>>>>       OPAL release date: Jun 26, 2013
>>>>                 MPI API: 2.1
>>>>            Ident string: 1.6.5
>>>>                  Prefix: /usr/local/openmpi
>>>> Configured architecture: i686-pc-linux-gnu
>>>>          Configure host: fehg-node-0
>>>>           Configured by: ubuntu
>>>>           Configured on: Sat Mar 28 20:19:28 UTC 2015
>>>>          Configure host: fehg-node-0
>>>>                Built by: root
>>>>                Built on: Sat Mar 28 20:30:18 UTC 2015
>>>>              Built host: fehg-node-0
>>>>              C bindings: yes
>>>>            C++ bindings: yes
>>>>      Fortran77 bindings: no
>>>>      Fortran90 bindings: no
>>>> Fortran90 bindings size: na
>>>>              C compiler: gcc
>>>>     C compiler absolute: /usr/bin/gcc
>>>>  C compiler family name: GNU
>>>>      C compiler version: 4.6.3
>>>>            C++ compiler: g++
>>>>   C++ compiler absolute: /usr/bin/g++
>>>>      Fortran77 compiler: none
>>>>  Fortran77 compiler abs: none
>>>>      Fortran90 compiler: none
>>>>  Fortran90 compiler abs: none
>>>>             C profiling: yes
>>>>           C++ profiling: yes
>>>>     Fortran77 profiling: no
>>>>     Fortran90 profiling: no
>>>>          C++ exceptions: no
>>>>          Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no)
>>>>           Sparse Groups: no
>>>>  Internal debug support: yes
>>>>  MPI interface warnings: no
>>>>     MPI parameter check: runtime
>>>> Memory profiling support: no
>>>> Memory debugging support: no
>>>>         libltdl support: yes
>>>>   Heterogeneous support: no
>>>> mpirun default --prefix: no
>>>>         MPI I/O support: yes
>>>>       MPI_WTIME support: gettimeofday
>>>>     Symbol vis. support: yes
>>>>   Host topology support: yes
>>>>          MPI extensions: affinity example
>>>>   FT Checkpoint support: no (checkpoint thread: no)
>>>>     VampirTrace support: yes
>>>>  MPI_MAX_PROCESSOR_NAME: 256
>>>>    MPI_MAX_ERROR_STRING: 256
>>>>     MPI_MAX_OBJECT_NAME: 64
>>>>        MPI_MAX_INFO_KEY: 36
>>>>        MPI_MAX_INFO_VAL: 256
>>>>       MPI_MAX_PORT_NAME: 1024
>>>>  MPI_MAX_DATAREP_STRING: 128
>>>>           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA memory: linux (MCA v2.0, API v2.0, Component v1.6.5)
>>>>           MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA carto: file (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA shmem: mmap (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA shmem: posix (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA shmem: sysv (MCA v2.0, API v2.0, Component v1.6.5)
>>>>           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.6.5)
>>>>           MCA maffinity: hwloc (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA timer: linux (MCA v2.0, API v2.0, Component v1.6.5)
>>>>         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.6.5)
>>>>         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.6.5)
>>>>             MCA sysinfo: linux (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA hwloc: hwloc132 (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.6.5)
>>>>           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.6.5)
>>>>           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: basic (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: inter (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: self (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: sm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: sync (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                  MCA io: romio (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA mpool: fake (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA pml: bfo (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA pml: csum (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA pml: v (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA btl: self (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA topo: unity (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                MCA odls: default (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ras: cm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ras: loadleveler (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA rmaps: load_balance (MCA v2.0, API v2.0, Component 
>>>> v1.6.5)
>>>>               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA routed: cm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA routed: direct (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA routed: linear (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA routed: radix (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA routed: slave (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.6.5)
>>>>              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: env (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: slave (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5)
>>>>                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.6.5)
>>>>             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.6.5)
>>>>             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.6.5)
>>>>             MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.6.5)
>>>>            MCA notifier: command (MCA v2.0, API v1.0, Component v1.6.5)
>>>>            MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.6.5)
>>>>
>>>>
>>>> Regards,
>>>> Karos
>>>>
>>>>
>>>>
>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>> [r...@open-mpi.org]
>>>> Sent: 28 March 2015 22:04
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>
>>>> Something is clearly wrong. Most likely, you are not pointing at the OMPI 
>>>> install that you think you are - or you didn’t really configure it 
>>>> properly. Check the path by running “which mpirun” and ensure you are 
>>>> executing the one you expected. If so, then run “ompi_info” to see how it 
>>>> was configured and sent it to us.
>>>>
>>>>
>>>>> On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>> wrote:
>>>>>
>>>>> surprisingly,  it is all that I get!! nothing else come after.  This is 
>>>>> the same for openmpi-1.6.5.
>>>>>
>>>>>
>>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>>> [r...@open-mpi.org]
>>>>> Sent: 28 March 2015 20:12
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>>
>>>>> Did you configure —enable-debug? We aren’t seeing any of the debug 
>>>>> output, so I suspect not.
>>>>>
>>>>>
>>>>>> On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>>> wrote:
>>>>>>
>>>>>> I have done it and it is the results:
>>>>>>
>>>>>> ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 
>>>>>> -mca state_base_verbose 10 hostname
>>>>>> [fehg-node-0:30034] mca: base: components_open: Looking for oob 
>>>>>> components
>>>>>> [fehg-node-0:30034] mca: base: components_open: opening oob components
>>>>>> [fehg-node-0:30034] mca: base: components_open: found loaded component 
>>>>>> tcp
>>>>>> [fehg-node-0:30034] mca: base: components_open: component tcp register 
>>>>>> function successful
>>>>>> [fehg-node-0:30034] mca: base: components_open: component tcp open 
>>>>>> function successful
>>>>>> [fehg-node-7:31138] mca: base: components_open: Looking for oob 
>>>>>> components
>>>>>> [fehg-node-7:31138] mca: base: components_open: opening oob components
>>>>>> [fehg-node-7:31138] mca: base: components_open: found loaded component 
>>>>>> tcp
>>>>>> [fehg-node-7:31138] mca: base: components_open: component tcp register 
>>>>>> function successful
>>>>>> [fehg-node-7:31138] mca: base: components_open: component tcp open 
>>>>>> function successful
>>>>>>
>>>>>> freeze ...
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> From: users [users-boun...@open-mpi.org] on behalf of LOTFIFAR F. 
>>>>>> [foad.lotfi...@durham.ac.uk]
>>>>>> Sent: 28 March 2015 18:49
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>>>
>>>>>> fehg_node_1 and fehg-node-7 are the same. it is just a typo.
>>>>>>
>>>>>> Correction: VM names are fehg-node-0 and fehg-node-7.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>>>> [r...@open-mpi.org]
>>>>>> Sent: 28 March 2015 18:23
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>>>
>>>>>> Just to be clear: do you have two physical nodes? Or just one physical 
>>>>>> node and you are running two VMs on it?
>>>>>>
>>>>>>> On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>>>> wrote:
>>>>>>>
>>>>>>> I have a floating IP for accessing nodes from outside of the cluster 
>>>>>>> and internal ip addresses. I tried to run the jobs with both of them 
>>>>>>> (both ip addresses) but it makes no difference.
>>>>>>> I have just installed openmpi 1.6.5 to see how does this version works. 
>>>>>>> In this case I get nothing and I have to press Crtl+c. not output or 
>>>>>>> error is shown.
>>>>>>>
>>>>>>>
>>>>>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
>>>>>>> [r...@open-mpi.org]
>>>>>>> Sent: 28 March 2015 17:03
>>>>>>> To: Open MPI Users
>>>>>>> Subject: Re: [OMPI users] Connection problem on Linux cluster
>>>>>>>
>>>>>>> You mentioned running this in a VM - is that IP address correct for 
>>>>>>> getting across the VMs?
>>>>>>>
>>>>>>>
>>>>>>>> On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi ,
>>>>>>>>
>>>>>>>> I am wondering how can I solve this problem.
>>>>>>>> System Spec:
>>>>>>>> 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 
>>>>>>>> LTS 32bit.
>>>>>>>> 2- openmpi 1.8.4
>>>>>>>>
>>>>>>>> I do a simple test running on fehg_node_0:
>>>>>>>>> mpirun -host fehg_node_0,fehg_node_1 hello_world -mca 
>>>>>>>>> oob_base_verbose 20
>>>>>>>>
>>>>>>>> and I get the following error:
>>>>>>>>
>>>>>>>> A process or daemon was unable to complete a TCP connection
>>>>>>>> to another process:
>>>>>>>>  Local host:    fehg-node-0
>>>>>>>>  Remote host:   10.104.5.40
>>>>>>>> This is usually caused by a firewall on the remote host. Please
>>>>>>>> check that any firewall (e.g., iptables) has been disabled and
>>>>>>>> try again.
>>>>>>>> ------------------------------------------------------------
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> ORTE was unable to reliably start one or more daemons.
>>>>>>>> This usually is caused by:
>>>>>>>>
>>>>>>>> * not finding the required libraries and/or binaries on
>>>>>>>>  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>>>>>>>  settings, or configure OMPI with --enable-orterun-prefix-by-default
>>>>>>>>
>>>>>>>> * lack of authority to execute on one or more specified nodes.
>>>>>>>>  Please verify your allocation and authorities.
>>>>>>>>
>>>>>>>> * the inability to write startup files into /tmp 
>>>>>>>> (--tmpdir/orte_tmpdir_base).
>>>>>>>>  Please check with your sys admin to determine the correct location to 
>>>>>>>> use.
>>>>>>>>
>>>>>>>> *  compilation of the orted with dynamic libraries when static are 
>>>>>>>> required
>>>>>>>>  (e.g., on Cray). Please check your configure cmd line and consider 
>>>>>>>> using
>>>>>>>>  one of the contrib/platform definitions for your system type.
>>>>>>>>
>>>>>>>> * an inability to create a connection back to mpirun due to a
>>>>>>>>  lack of common network interfaces and/or no route found between
>>>>>>>>  them. Please check network connectivity (including firewalls
>>>>>>>>  and network routing requirements).
>>>>>>>>
>>>>>>>> Verbose:
>>>>>>>> 1- I have full access to the VMs on the cluster and setup everything 
>>>>>>>> myself
>>>>>>>> 2- Firewall and iptables are all disabled on the nodes
>>>>>>>> 3- nodes can ssh to each other with  no problem
>>>>>>>> 4- non-interactive bash calls works fine i.e. when I run ssh othernode 
>>>>>>>> env | grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set 
>>>>>>>> correctly
>>>>>>>> 5- I have checked the posts, a similar problem reported for Solaris 
>>>>>>>> but I could not find a clue about mine.
>>>>>>>> 6- run with --enable-orterun-prefix-by-default does not make any 
>>>>>>>> changes.
>>>>>>>> 7-  I see orte is running on the other node when I check processes, 
>>>>>>>> but nothing happens after that and the error happens.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Karos
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26555.php
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26557.php
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26562.php
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26564.php
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/03/26566.php
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/03/26567.php
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/03/26569.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26570.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/03/26571.php

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/03/26572.php

Reply via email to