Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Jeff Squyres
I think Brian is referring to: https://svn.open-mpi.org/trac/ompi/changeset/12852 On Feb 14, 2007, at 1:02 PM, Brian W. Barrett wrote: On Feb 14, 2007, at 10:50 AM, Jeff Squyres wrote: On Feb 14, 2007, at 12:43 PM, Adrian Knoth wrote: Blah. We definitely need to work on our error me

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Mark Kosmowski
FWIW, what did MPICH say for the error? I followed the install.pdf that comes with mpich. They have you start up the daemon ring then run mpdtrace. This command tells each daemon instance to report the hostname. I don't remember the exact error message, but it was very clear that NODENAME was

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Brian W. Barrett
On Feb 14, 2007, at 10:50 AM, Jeff Squyres wrote: On Feb 14, 2007, at 12:43 PM, Adrian Knoth wrote: Blah. We definitely need to work on our error messages. I think we could use gai_strerror(3) for this. If we could agree to get rid of SUSv2 and rely on RFC 3493 ;) It would not be too diff

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Jeff Squyres
On Feb 14, 2007, at 12:43 PM, Adrian Knoth wrote: Blah. We definitely need to work on our error messages. I think we could use gai_strerror(3) for this. If we could agree to get rid of SUSv2 and rely on RFC 3493 ;) It would not be too difficult to add gai_strerror() checking into confi

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Adrian Knoth
On Wed, Feb 14, 2007 at 12:32:46PM -0500, Jeff Squyres wrote: > > ... hostname worked, but my application hung and gave a connect() > > errno 110. adi@drcomp:~$ perl -e 'die$!=110' Connection timed out at -e line 1. > Blah. We definitely need to work on our error messages. I think we could use

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Jeff Squyres
On Feb 14, 2007, at 12:28 PM, Mark Kosmowski wrote: Everything is working properly now. I needed to reinstall Linux on one of my nodes after a botched attempt at a network install - mpirun ... hostname worked, but my application hung and gave a connect() errno 110. At this point I decided to g

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-14 Thread Mark Kosmowski
Everything is working properly now. I needed to reinstall Linux on one of my nodes after a botched attempt at a network install - mpirun ... hostname worked, but my application hung and gave a connect() errno 110. At this point I decided to give up and try mpich instead. During the mpich sanity

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-08 Thread Alex Tumanov
mpirun --prefix /opt/openmpi -mca oob_tcp_include eth0 -mca btl_tcp_if_include eth0 --hostfile ~/work/openmpi_hostfile -np 4 hostname Could a section be added to the FAQ mentioning that the firewall service should be shutdown over the mpi interface and that the two -mca switches should be used?

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-08 Thread Alex Tumanov
I have added the following line to my .bashrc: export OMPIFLAGS="-mca oob_tcp_include eth0 -mca btl_tcp_if_include eth0 --hostfile ~/work/openmpi_hostfile" and have verified that mpirun $OMPIFLAGS -np 4 hostname works. Is there a better way of accomplishing this, or is this a matter of there be

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-08 Thread Brian W. Barrett
For things like these, I usually use the "dot file" mca parameter file in my home directory: http://www.open-mpi.org/faq/?category=tuning#setting-mca-params That way, I don't accidently forget to set the parameters on a given run ;). Brian On Feb 8, 2007, at 6:15 PM, Mark Kosmowski wr

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-08 Thread Mark Kosmowski
I have a style question related to this issue that I think is resolved. I have added the following line to my .bashrc: export OMPIFLAGS="-mca oob_tcp_include eth0 -mca btl_tcp_if_include eth0 --hostfile ~/work/openmpi_hostfile" and have verified that mpirun $OMPIFLAGS -np 4 hostname works. Is

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-08 Thread Mark Kosmowski
I think I fixed the problem. I at least have mpirun ... hostname working over the cluster. The first thing I needed to do was to make the gigabit network an internal zone in Yast ... firewall (which essentially turns off the firewall over this interface). Next I needed to add the -mca options a

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-08 Thread Mark Kosmowski
Please find attached a tarball containing the stderror of mpirun ... hostname across my cluster as well as the output from ompi_info. Apologies for not including these earlier. Thank you for any and all assistance, Mark Kosmowski ompi-output.tar.gz Description: GNU Zip compressed data

Re: [OMPI users] first time user - can run mpi job SMP but not over cluster

2007-02-07 Thread Alex Tumanov
Hello, mpirun -np 2 myprogram inputfile >outputfile There can be a whole host of issues with the way you run your executable and/or the way you have the environment setup. First of all, when you ssh into the node, does the environment automatically get updated with correct Open MPI paths? I.e.