Something is clearly wrong. Most likely, you are not pointing at the OMPI install that you think you are - or you didn’t really configure it properly. Check the path by running “which mpirun” and ensure you are executing the one you expected. If so, then run “ompi_info” to see how it was configured and sent it to us.
> On Mar 28, 2015, at 1:36 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk> wrote: > > surprisingly, it is all that I get!! nothing else come after. This is the > same for openmpi-1.6.5. > > > From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: 28 March 2015 20:12 > To: Open MPI Users > Subject: Re: [OMPI users] Connection problem on Linux cluster > > Did you configure —enable-debug? We aren’t seeing any of the debug output, so > I suspect not. > > >> On Mar 28, 2015, at 12:56 PM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk >> <mailto:foad.lotfi...@durham.ac.uk>> wrote: >> >> I have done it and it is the results: >> >> ubuntu@fehg-node-0:~$ mpirun -host fehg-node-7 -mca oob_base_verbose 100 >> -mca state_base_verbose 10 hostname >> [fehg-node-0:30034] mca: base: components_open: Looking for oob components >> [fehg-node-0:30034] mca: base: components_open: opening oob components >> [fehg-node-0:30034] mca: base: components_open: found loaded component tcp >> [fehg-node-0:30034] mca: base: components_open: component tcp register >> function successful >> [fehg-node-0:30034] mca: base: components_open: component tcp open function >> successful >> [fehg-node-7:31138] mca: base: components_open: Looking for oob components >> [fehg-node-7:31138] mca: base: components_open: opening oob components >> [fehg-node-7:31138] mca: base: components_open: found loaded component tcp >> [fehg-node-7:31138] mca: base: components_open: component tcp register >> function successful >> [fehg-node-7:31138] mca: base: components_open: component tcp open function >> successful >> >> freeze ... >> >> Regards >> >> From: users [users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>] >> on behalf of LOTFIFAR F. [foad.lotfi...@durham.ac.uk >> <mailto:foad.lotfi...@durham.ac.uk>] >> Sent: 28 March 2015 18:49 >> To: Open MPI Users >> Subject: Re: [OMPI users] Connection problem on Linux cluster >> >> fehg_node_1 and fehg-node-7 are the same. it is just a typo. >> >> Correction: VM names are fehg-node-0 and fehg-node-7. >> >> >> Regards, >> >> From: users [users-boun...@open-mpi.org <mailto:users-boun...@open-mpi.org>] >> on behalf of Ralph Castain [r...@open-mpi.org <mailto:r...@open-mpi.org>] >> Sent: 28 March 2015 18:23 >> To: Open MPI Users >> Subject: Re: [OMPI users] Connection problem on Linux cluster >> >> Just to be clear: do you have two physical nodes? Or just one physical node >> and you are running two VMs on it? >> >>> On Mar 28, 2015, at 10:51 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk >>> <mailto:foad.lotfi...@durham.ac.uk>> wrote: >>> >>> I have a floating IP for accessing nodes from outside of the cluster and >>> internal ip addresses. I tried to run the jobs with both of them (both ip >>> addresses) but it makes no difference. >>> I have just installed openmpi 1.6.5 to see how does this version works. In >>> this case I get nothing and I have to press Crtl+c. not output or error is >>> shown. >>> >>> >>> From: users [users-boun...@open-mpi.org >>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>> Sent: 28 March 2015 17:03 >>> To: Open MPI Users >>> Subject: Re: [OMPI users] Connection problem on Linux cluster >>> >>> You mentioned running this in a VM - is that IP address correct for getting >>> across the VMs? >>> >>> >>>> On Mar 28, 2015, at 8:38 AM, LOTFIFAR F. <foad.lotfi...@durham.ac.uk >>>> <mailto:foad.lotfi...@durham.ac.uk>> wrote: >>>> >>>> Hi , >>>> >>>> I am wondering how can I solve this problem. >>>> System Spec: >>>> 1- Linux cluster with two nodes (master and slave) with Ubuntu 12.04 LTS >>>> 32bit. >>>> 2- openmpi 1.8.4 >>>> >>>> I do a simple test running on fehg_node_0: >>>> > mpirun -host fehg_node_0,fehg_node_1 hello_world -mca oob_base_verbose 20 >>>> >>>> and I get the following error: >>>> >>>> A process or daemon was unable to complete a TCP connection >>>> to another process: >>>> Local host: fehg-node-0 >>>> Remote host: 10.104.5.40 >>>> This is usually caused by a firewall on the remote host. Please >>>> check that any firewall (e.g., iptables) has been disabled and >>>> try again. >>>> ------------------------------------------------------------ >>>> -------------------------------------------------------------------------- >>>> ORTE was unable to reliably start one or more daemons. >>>> This usually is caused by: >>>> >>>> * not finding the required libraries and/or binaries on >>>> one or more nodes. Please check your PATH and LD_LIBRARY_PATH >>>> settings, or configure OMPI with --enable-orterun-prefix-by-default >>>> >>>> * lack of authority to execute on one or more specified nodes. >>>> Please verify your allocation and authorities. >>>> >>>> * the inability to write startup files into /tmp >>>> (--tmpdir/orte_tmpdir_base). >>>> Please check with your sys admin to determine the correct location to >>>> use. >>>> >>>> * compilation of the orted with dynamic libraries when static are required >>>> (e.g., on Cray). Please check your configure cmd line and consider using >>>> one of the contrib/platform definitions for your system type. >>>> >>>> * an inability to create a connection back to mpirun due to a >>>> lack of common network interfaces and/or no route found between >>>> them. Please check network connectivity (including firewalls >>>> and network routing requirements). >>>> >>>> Verbose: >>>> 1- I have full access to the VMs on the cluster and setup everything myself >>>> 2- Firewall and iptables are all disabled on the nodes >>>> 3- nodes can ssh to each other with no problem >>>> 4- non-interactive bash calls works fine i.e. when I run ssh othernode env >>>> | grep PATH from both nodes, both PATH and LD_LIBRARY_PATH are set >>>> correctly >>>> 5- I have checked the posts, a similar problem reported for Solaris but I >>>> could not find a clue about mine. >>>> 6- run with --enable-orterun-prefix-by-default does not make any changes. >>>> 7- I see orte is running on the other node when I check processes, but >>>> nothing happens after that and the error happens. >>>> >>>> Regards, >>>> Karos >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/03/26555.php >>>> <http://www.open-mpi.org/community/lists/users/2015/03/26555.php> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/03/26557.php >>> <http://www.open-mpi.org/community/lists/users/2015/03/26557.php> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26562.php >> <http://www.open-mpi.org/community/lists/users/2015/03/26562.php> > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26564.php > <http://www.open-mpi.org/community/lists/users/2015/03/26564.php>