surprisingly,  it is all that I get!! nothing else come after.  This is the 
same for openmpi-1.6.5.

From: users [] on behalf of Ralph Castain 
Sent: 28 March 2015 20:12
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Did you configure —enable-debug? We aren’t seeing any of the debug output, so I 
suspect not.

On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. 
<<>> wrote:

I have done it and it is the results:

ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 -mca 
state_base_verbose 10 hostname
[fehg-node-0:30034] mca: base: components_open: Looking for oob components
[fehg-node-0:30034] mca: base: components_open: opening oob components
[fehg-node-0:30034] mca: base: components_open: found loaded component tcp
[fehg-node-0:30034] mca: base: components_open: component tcp register function 
[fehg-node-0:30034] mca: base: components_open: component tcp open function 
[fehg-node-7:31138] mca: base: components_open: Looking for oob components
[fehg-node-7:31138] mca: base: components_open: opening oob components
[fehg-node-7:31138] mca: base: components_open: found loaded component tcp
[fehg-node-7:31138] mca: base: components_open: component tcp register function 
[fehg-node-7:31138] mca: base: components_open: component tcp open function 

freeze ...


From: users [<>] on 
behalf of LOTFIFAR F. 
Sent: 28 March 2015 18:49
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

fehg_node_1 and fehg-node-7 are the same. it is just a typo.

Correction: VM names are fehg-node-0 and fehg-node-7.


From: users [<>] on 
behalf of Ralph Castain [<>]
Sent: 28 March 2015 18:23
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

Just to be clear: do you have two physical nodes? Or just one physical node and 
you are running two VMs on it?

On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. 
<<>> wrote:

I have a floating IP for accessing nodes from outside of the cluster and 
internal ip addresses. I tried to run the jobs with both of them (both ip 
addresses) but it makes no difference.
I have just installed openmpi 1.6.5 to see how does this version works. In this 
case I get nothing and I have to press Crtl+c. not output or error is shown.

From: users [<>] on 
behalf of Ralph Castain [<>]
Sent: 28 March 2015 17:03
To: Open MPI Users
Subject: Re: [OMPI users] Connection problem on Linux cluster

You mentioned running this in a VM - is that IP address correct for getting 
across the VMs?

On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. 
<<>> wrote:

Hi ,

I am wondering how can I solve this problem.
System Spec:
1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS 32bit.
2- openmpi 1.8.4

I do a simple test running on fehg_node_0:
> mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20

and I get the following error:

A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    fehg-node-0
  Remote host:
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (including firewalls
  and network routing requirements).

1- I have full access to the VMs on the cluster and setup everything myself
2- Firewall and iptables are all disabled on the nodes
3- nodes can ssh to each other with  no problem
4- non-interactive bash calls works fine i.e. when I run ssh othernode env | 
grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set correctly
5- I have checked the posts, a similar problem reported for Solaris but I could 
not find a clue about mine.
6- run with --enable-orterun-prefix-by-default does not make any changes.
7-  I see orte is running on the other node when I check processes, but nothing 
happens after that and the error happens.

users mailing list<>
Link to this post:

users mailing list<>
Link to this post:

users mailing list<>
Link to this post:

Reply via email to