Dear Bart, I think OpenMPI don't need to be installed on all machines because they are NFS shared with the master node. I don't know how to check output of which orted, it is running just on the master node. I have another application which is running similarly but I am having problem with WRF.
On Tue, May 3, 2011 at 9:06 PM, Bart Brashers <bbrash...@environcorp.com>wrote: > It looks like OpenMPI is not installed on all your execution machines. You > need to install at least the libs on all machines, or on an NFS-shared > location. Check the output of "which orted" on the machine that works. > > > > Bart > > > > *From:* wrf-users-boun...@ucar.edu [mailto:wrf-users-boun...@ucar.edu] *On > Behalf Of *Ahsan Ali > *Sent:* Tuesday, May 03, 2011 1:04 AM > *To:* us...@open-mpi.org > *Subject:* [Wrf-users] WRF Problem running in Parallel on multiple > nodes(cluster) > > > > Hello, > > > > I am able to run WRFV3.2.1 using mpirun on multiple cores of single > machine, but when I want to run it across multiple nodes in cluster using > hostlist then I get error, The compute nodes are mounted with the master > node during boot using NFS. I get following error. Please help. > > > > [root@pmd02 em_real]# mpirun -np 10 -hostfile /home/pmdtest/hostlist > ./real.exe > > bash: orted: command not found > > bash: orted: command not found > > -------------------------------------------------------------------------- > > A daemon (pid 22006) died unexpectedly with status 127 while attempting > > to launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > -------------------------------------------------------------------------- > > mpirun: clean termination accomplished > > > > > -- > Syed Ahsan Ali Bokhari > Electronic Engineer (EE) > > > Research & Development Division > Pakistan Meteorological Department H-8/4, Islamabad. > Phone # off +92518358714 > > Cell # +923155145014 > > > > ------------------------------ > This message contains information that may be confidential, privileged or > otherwise protected by law from disclosure. It is intended for the exclusive > use of the Addressee(s). Unless you are the addressee or authorized agent of > the addressee, you may not review, copy, distribute or disclose to anyone > the message or any information contained within. If you have received this > message in error, please contact the sender by electronic reply to > em...@environcorp.com and immediately delete all copies of the message. > > -- Syed Ahsan Ali Bokhari Electronic Engineer (EE) Research & Development Division Pakistan Meteorological Department H-8/4, Islamabad. Phone # off +92518358714 Cell # +923155145014