Dear Sir/Madam,I'm having problem running example program. Please kindly help --- I've been fooling with it for days, kind of getting lost.
--------------------------------------------------------------------------------- MPIRUN fails on example hello prgram -unable to launch the specified application on client node --------------------------------------------------------------------------------- 1) I'm trying to run opemMPI with the following setting:1 PC (as master node) and 1 notebook (as client node) connected to an ethernet router through ethernet cable. Both running Ubuntu 8.10. There's no other connections. - Is this setting OK to run OpenMPI?
2) PrerequisitesSSH has been set up so that the master node can access the client node through passwordless ssh. I do notice that it takes 10~15 seconds between me entering '>ssh <slave ip address>'command and getting onto the client node.
--- Could this be too slow for openmpi to run properlly?I do not have programs like network file system, network time protocol, resource management, scheduler, etc installed.
--- Does OpenMPI need any prerequites other than passwordless ssh?3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, and do configure/make all using Default Settings.
4) PATH and LD_LIBRARY_PATH On both nodes,PATH is /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, which is the default setting in ubuntu. LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get: >echo $PATH >/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games >echo $LD_LIBRARY_PATH >usr/local/lib:usr/lib But, if I do >ssh <client_ip> 'echo $LD_LIBRARY_PATH' nothing comes back. while >ssh <client_ip> 'echo $PATH' comes back with the right path. Is that a problem? 4) Problem: I compiled the example Hello_c using >mpicc hello_c.c -o hello_c.out and run them on both nodes locally, everything works fine. But when I tried to run it on 2 nodes (-np 2) >mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out I got the following error: ----------------------------------------------------------------------------gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun --machinefile machine.linux -np 2 $(pwd)/hello_c.out
-------------------------------------------------------------------------- mpirun was unable to launch the specified application as it could not access or execute an executable: Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out Node: 192.168.0.194 while attempting to start process rank 1. -------------------------------------------------------------------------- Sometimes I get one other error message after that: --------------------------------------------------------------------------[gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
------------------------------------------------------------------------------ 5) Infomation attached: ifconfig_masternode - output of ifconfig on masternode ifconfig_slavenode - output of ifconfig on slavenode ompi_info.txt - output of ompi_info -all config.log - OpenMPI logfile machine.linux - the machinefile used in mpirun command -- Sincerely, Qing Pang (601) 979 0270
mpirun_info.tar.gz
Description: application/gzip
--------------------------------------------------------------------------------- MPIRUN fails on example hello prgram -unable to launch the specified application on client node --------------------------------------------------------------------------------- 1) I'm trying to run opemMPI with the following setting: 1 PC (as master node) and 1 notebook (as client node) connected to an ethernet router through ethernet cable. Both running Ubuntu 8.10. There's no other connections. - Is this setting OK to run OpenMPI? 2) Prerequisites SSH has been set up so that the master node can access the client node through passwordless ssh. I do notice that it takes 10~15 seconds between me entering '>ssh <slave ip address>'command and getting onto the client node. - Can this be too slow for openmpi to run properlly? I do not have programs like network file system, network time protocol, resource management, scheduler, etc installed. - Does OpenMPI have any prerequites other than passwordless ssh? 3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, and do configure/make all using Default Settings. 4) PATH and LD_LIBRARY_PATH On both nodes, PATH is /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, which is the default setting in ubuntu. LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib' So when I echo them on both nodes, I get: >echo $PATH >/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games >echo $LD_LIBRARY_PATH >usr/local/lib:usr/lib But, if I do a >ssh <client_ip> 'echo $LD_LIBRARY_PATH' nothing comes back. while >ssh <client_ip> 'echo $PATH' comes back with the right path. Is that a problem? 4) Problem: I compiled the example Hello_c using >mpicc hello_c.c -o hello_c.out and run them on both nodes locally, everything was fine. But when I tried to run it on 2 nodes (-np 2) >mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out I got the following error: ---------------------------------------------------------------------------- gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun --machinefile machine.linux -np 2 $(pwd)/hello_c.out -------------------------------------------------------------------------- mpirun was unable to launch the specified application as it could not access or execute an executable: Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out Node: 192.168.0.194 while attempting to start process rank 1. -------------------------------------------------------------------------- Sometimes I get other error message after that: -------------------------------------------------------------------------- [gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104) ------------------------------------------------------------------------------ 5) Infomation attached: ifconfig_masternode - output of ifconfig on masternode ifconfig_slavenode - output of ifconfig on slavenode ompi_info.txt - output of ompi_info -all config.log - OpenMPI logfile machine.linux - the machinefile used in mpirun command