Re: [OMPI users] openmpi query

Gus Correa Tue, 8 Apr 2014 12:50:15 -0400 (EDT)

On 04/08/2014 06:37 AM, Jeff Squyres (jsquyres) wrote:

You should ping the Rocks maintainers and ask them to upgrade.

Open MPI 1.4.3 was released in September of 2010.


On Rocks, you can install OpenMPI from source (and any other software
application by the way) on their standard NFS shared directory
named /share/apps.
Typically, you could make a subdirectory
/share/apps/openmpi/1.8/gnu

If you don't have permission to write on /share/apps,
then you could use your home directory, say /home/nisha/openmpi/1.8/gnu,
which is also NFS shared.

See this FAQ for general guidelines:
http://www.open-mpi.org/faq/?category=building#where-to-install

Download the OMPI tarball somewhere in your home directory
(say /home/nisha/Download/ompi-1.8),
and configure it there with the prefix above.

Use different directories for download/build and for installation(--prefix below).

You can use the gnu compilers, or other compilers if you prefer:

http://www.open-mpi.org/faq/?category=building#build-compilers

Then do:

./configure --prefix=/share/apps/openmpi/1.8/gnu CC=gcc CXX=g++F77=gfortran FC=gfortran

[Unless something changed with the Fortran interface on OMPI 1.8.]

make

make install

Then set your PATH and LD_LIBRARY_PATH as recommended in the OMPI FAQ:

http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path

Compile your application with the OMPI version above (you may need
to adjust the Makefile, etc).

Reading the OMPI FAQ is a big time and headache saver:

http://www.open-mpi.org/faq/


On Apr 8, 2014, at 5:37 AM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

latest rocks 6.2  carry this version only


On Tue, Apr 8, 2014 at 3:49 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
wrote:
Open MPI 1.4.3 is *ancient*.  Please upgrade -- we just released Open MPI 1.8 
last week.

Also, please look at this FAQ entry -- it steps you through a lot of basic 
troubleshooting steps about getting basic MPI programs working.

http://www.open-mpi.org/faq/?category=running#diagnose-multi-host-problems

Once you get basic MPI programs working, then try with MPI Blast.



On Apr 5, 2014, at 3:11 AM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

Mpirun --mca btl ^openib --mca btl_tcp_if_include eth0  -np 16  -machinefile mf 
mpiblast -d all.fas -p blastn -i query.fas -o out.txt

was the command i executed on cluster...



On Sat, Apr 5, 2014 at 12:34 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:
sorry Ralph my mistake its not names...it is "it does not happen on same nodes."


On Sat, Apr 5, 2014 at 12:33 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:
same vm on all machines that is virt-manager


On Sat, Apr 5, 2014 at 12:32 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:
opmpi version 1.4.3


On Fri, Apr 4, 2014 at 8:13 PM, Ralph Castain <r...@open-mpi.org> wrote:
Okay, so if you run mpiBlast on all the non-name nodes, everything is okay? What do you 
mean by "names nodes"?


On Apr 4, 2014, at 7:32 AM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

no it does not happen on names nodes


On Fri, Apr 4, 2014 at 7:51 PM, Ralph Castain <r...@open-mpi.org> wrote:
Hi Nisha

I'm sorry if my questions appear abrasive - I'm just a little frustrated at the 
communication bottleneck as I can't seem to get a clear picture of your situation. So you 
really don't need to keep calling me "sir" :-)

The error you are hitting is very unusual - it means that the processes are 
able to make a connection, but are failing to correctly complete a simple 
handshake exchange of their process identifications. There are only a few ways 
that can happen, and I'm trying to get you to test for them.

So let's try and see if we can narrow this down. You mention that it works on 
some machines, but not all. Is this consistent - i.e., is it always the same 
machines that work, and the same ones that generate the error? If you exclude 
the ones that show the error, does it work? If so, what is different about 
those nodes? Are they a different architecture?


On Apr 3, 2014, at 11:09 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

sir
smae virt-manager is bein used by all pc's.no i did n't enable 
openmpi-hetro.Yes openmpi version is same in all through same kickstart file.
ok...actually sir...rocks itself installed,configured openmpi and mpich on it 
own through hpc roll.


On Fri, Apr 4, 2014 at 9:25 AM, Ralph Castain <r...@open-mpi.org> wrote:

On Apr 3, 2014, at 8:03 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

thankyou Ralph.
Yes cluster is heterogenous...


And did you configure OMPI --enable-heterogeneous? And are you running it with 
---hetero-nodes? What version of OMPI are you using anyway?

Note that we don't care if the host pc's are hetero - what we care about is the 
VM. If all the VMs are the same, then it shouldn't matter. However, most VM 
technologies don't handle hetero hardware very well - i.e., you can't emulate 
an x86 architecture on top of a Sparc or Power chip or vice versa.

And i haven't made compute nodes on direct physical nodes (pc's) becoz in 
college it is not possible to take whole lab of 32 pc's for your work  so i ran 
on vm.


Yes, but at least it would let you test the setup to run MPI across even a 
couple of pc's - this is simple debugging practice.

In Rocks cluster, frontend give the same kickstart to all the pc's so openmpi 
version should be same i guess.


Guess? or know? Makes a difference - might be worth testing.

Sir
mpiformatdb is a command to distribute database fragments to different compute 
nodes after partitioning od database.
And sir have you done mpiblast ?


Nope - but that isn't the issue, is it? The issue is with the MPI setup.



On Fri, Apr 4, 2014 at 4:48 AM, Ralph Castain <r...@open-mpi.org> wrote:
What is "mpiformatdb"? We don't have an MPI database in our system, and I have 
no idea what that command means

As for that error - it means that the identifier we exchange between processes 
is failing to be recognized. This could mean a couple of things:

1. the OMPI version on the two ends is different - could be you aren't getting 
the right paths set on the various machines

2. the cluster is heterogeneous

You say you have "virtual nodes" running on various PC's? That would be an 
unusual setup - VM's can be problematic given the way they handle TCP connections, so 
that might be another source of the problem if my understanding of your setup is correct. 
Have you tried running this across the PCs directly - i.e., without any VMs?


On Apr 3, 2014, at 10:13 AM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

i first formatted my database with mpiformatdb command then i ran command :
mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas -o 
output.txt
but then it gave this error 113 from some hosts and continue to run for other 
but with no  results even after 2 hours lapsed.....on rocks 6.0 cluster with 12 
virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram to each


On Thu, Apr 3, 2014 at 10:41 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:
i also made machine file which contain ip adresses of all compute nodes + 
.ncbirc file for path to mpiblast and shared ,local storage path....
Sir
I ran the same command of mpirun on my college supercomputer 8 nodes each 
having 24 processors but it just running....gave no result uptill 3 hours...


On Thu, Apr 3, 2014 at 10:39 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:
i first formatted my database with mpiformatdb command then i ran command :
mpirun -np 64 -machinefile mf mpiblast -d all.fas -p blastn -i query.fas -o 
output.txt
but then it gave this error 113 from some hosts and continue to run for other 
but with results even after 2 hours lapsed.....on rocks 6.0 cluster with 12 
virtual nodes on pc's ...2 on each using virt-manger , 1 gb ram to each



On Thu, Apr 3, 2014 at 8:37 PM, Ralph Castain <r...@open-mpi.org> wrote:
I'm having trouble understanding your note, so perhaps I am getting this wrong. 
Let's see if I can figure out what you said:

* your perl command fails with "no route to host" - but I don't see any host in 
your cmd. Maybe I'm just missing something.

* you tried running a couple of "mpirun", but the mpirun command wasn't 
recognized? Is that correct?

* you then ran mpiblast and it sounds like it successfully started the 
processes, but then one aborted? Was there an error message beyond just the -1 
return status?


On Apr 2, 2014, at 11:17 PM, Nisha Dhankher -M.Tech(CSE) 
<nishadhankher-coaese...@pau.edu> wrote:

error btl_tcp_endpint.c: 638 connection failed due to error 113

In openmpi: this error came when i run my mpiblast program on rocks cluster.Connect to 
hosts failed on ip 10.1.255.236,10.1.255.244 . And when i run following command 
linux_shell$ perl -e 'die$!=113' this msg comes: "No route to host at -e line 
1." shell$ mpirun --mca btl ^tcp shell$ mpirun --mca btl_tcp_if_include eth1,eth2 
shell$ mpirun --mca btl_tcp_if_include 10.1.255.244 was also executed but it did nt 
recognized these commands....nd aborted.... what should i do...? When i run my mpiblast 
program for the frst time then it give mpi_abort error...bailing out of signal -1 on rank 
2 processor...then i removed my public ethernet cable....and then give btl_tcp endpint 
error 113....

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openmpi query

Reply via email to