Dear Sir/Madam,

I'm having problem running example program. Please kindly help --- I've been fooling with it for days, kind of getting lost.

---------------------------------------------------------------------------------
MPIRUN fails on example hello prgram
-unable to launch the specified application on client node
---------------------------------------------------------------------------------

1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an ethernet router through ethernet cable. Both running Ubuntu 8.10. There's no other connections. - Is this setting OK to run OpenMPI?

2) Prerequisites

SSH has been set up so that the master node can access the client node through passwordless ssh. I do notice that it takes 10~15 seconds between me entering '>ssh <slave ip address>'command and getting onto the client node.
--- Could this be too slow for openmpi to run properlly?

I do not have programs like network file system, network time protocol, resource management, scheduler, etc installed.
--- Does OpenMPI need any prerequites other than passwordless ssh?

3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, and do configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, which is the default setting in ubuntu. LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the file, 'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
>echo $PATH
>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>echo $LD_LIBRARY_PATH
>usr/local/lib:usr/lib

But, if I do
>ssh <client_ip> 'echo $LD_LIBRARY_PATH'
nothing comes back.

while
>ssh <client_ip> 'echo $PATH'
comes back with the right path.

Is that a problem?

4) Problem:
I compiled the example Hello_c using
>mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything works fine.

But when I tried to run it on 2 nodes (-np 2)
>mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:

----------------------------------------------------------------------------
gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun --machinefile machine.linux -np 2 $(pwd)/hello_c.out
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194

while attempting to start process rank 1.
--------------------------------------------------------------------------

Sometimes I get one other error message after that:
--------------------------------------------------------------------------
[gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv: readv failed: Connection reset by peer (104)
------------------------------------------------------------------------------

5) Infomation attached:
ifconfig_masternode - output of ifconfig on masternode
ifconfig_slavenode - output of ifconfig on slavenode
ompi_info.txt - output of ompi_info -all
config.log - OpenMPI logfile
machine.linux - the machinefile used in mpirun command

--
Sincerely,
Qing Pang
(601) 979 0270

Attachment: mpirun_info.tar.gz
Description: application/gzip

---------------------------------------------------------------------------------
MPIRUN fails on example hello prgram 
-unable to launch the specified application on client node 
---------------------------------------------------------------------------------


1) I'm trying to run opemMPI with the following setting:

1 PC (as master node) and 1 notebook (as client node) connected to an ethernet 
router through ethernet cable. Both running Ubuntu 8.10. There's no other 
connections. - Is this setting OK to run OpenMPI?

2) Prerequisites

SSH has been set up so that the master node can access the client node through 
passwordless ssh. I do notice that it takes 10~15 seconds between me entering 
'>ssh <slave ip address>'command and getting onto the client node. - Can this 
be too slow for openmpi to run properlly? 

I do not have programs like network file system, network time protocol, 
resource management, scheduler, etc installed. - Does OpenMPI have any 
prerequites other than passwordless ssh?

3) OpenMPI is installed on both nodes - downloaded from open-mpi.org, and do 
configure/make all using Default Settings.

4) PATH and LD_LIBRARY_PATH
On both nodes,
PATH is 
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games, which 
is the default setting in ubuntu.
LD_LIBRARY_PATH is set in ~/.bashrc - I added one line at the end of the file, 
'export LD_LIBRARY_PATH=usr/local/lib:usr/lib'
So when I echo them on both nodes, I get:
>echo $PATH 
>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
>echo $LD_LIBRARY_PATH
>usr/local/lib:usr/lib

But, if I do a 
>ssh <client_ip> 'echo $LD_LIBRARY_PATH'
nothing comes back. 

while
>ssh <client_ip> 'echo $PATH'
comes back with the right path.

Is that a problem?


4) Problem:
I compiled the example Hello_c using 
>mpicc hello_c.c -o hello_c.out
and run them on both nodes locally, everything was fine. 

But when I tried to run it on 2 nodes (-np 2)
>mpirun -machinefile machine.linux -np 2 $(pwd)/hello_c.out
I got the following error:

----------------------------------------------------------------------------
gordon@gordon-desktop:~/Desktop/openmpi-1.3.3/examples$ mpirun --machinefile 
machine.linux -np 2 $(pwd)/hello_c.out
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /home/gordon/Desktop/openmpi-1.3.3/examples/hello_c.out
Node: 192.168.0.194

while attempting to start process rank 1.
--------------------------------------------------------------------------

Sometimes I get other error message after that:
--------------------------------------------------------------------------
[gordon-desktop:30748] [[25975,0],0]-[[25975,1],0] mca_oob_tcp_msg_recv: readv 
failed: Connection reset by peer (104)
------------------------------------------------------------------------------

5) Infomation attached:
ifconfig_masternode - output of ifconfig on masternode
ifconfig_slavenode - output of ifconfig on slavenode
ompi_info.txt - output of ompi_info -all 
config.log - OpenMPI logfile 
machine.linux - the machinefile used in mpirun command

Reply via email to