What happens if you try to mpirun a non-MPI program like, "date" or "hostname"?
On Feb 11, 2011, at 6:14 AM, Marcela Castro León wrote: > Excuse me. I forgot the attaching. > > 2011/2/11 Marcela Castro León <mcast...@gmail.com> > Hello: > > I've the same version ob Ubuntu 10.04. The original version was Ubuntu Server > 9.1 (64) and upgraded both of them to 10.04. > Yesterday I've updated and upgraded to the same level again. But I've got the > same error after that. > The machine are exactly the same, HP Compaq with inter Core I5. > > Anyway I've compared the version of openmpi and gcc, and are the same too: > 1.4.1-2 and 4.4.4.3 respectly. I'm attaching the exit of the dpkg-l on the > two system. > > I would appreciate a lot any help to solve it. > Thank you. > > Marcela. > 2011/2/10 Jeff Squyres <jsquy...@cisco.com> > > I typically see these kinds of errors when there's an Open MPI version > mismatch between the nodes, and/or if there are slightly different flavors of > Linux installed on each node (i.e., you're technically in a heterogeneous > situation, but you're trying to run a single application binary). Can you > verify: > > 1. that you have exactly the same version of Open MPI installed on all nodes? > (and that your application was compiled against that exact version) > > 2. that you have exactly the same OS/update level installed on all nodes > (e.g., same versions of glibc, etc.) > > > On Feb 10, 2011, at 3:13 AM, Marcela Castro León wrote: > > > Hello > > I've a program that allways works fine, but i'm trying it on a new cluster > > and fails when I execute it on more than one machine. > > I mean, if I execute alone on each host, everything works fine. > > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt > > > > But when I execute > > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile > > /home/radic/mfile ../test parcorto.txt > > > > I get this error: > > > > mpirun has exited due to process rank 0 with PID 2132 on > > node santacruz exiting without calling "finalize". This may > > have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here). > > -------------------------------------------------------------------------- > > > > Though the machinefile (mfile) had only one machine, the programs fails. > > This is the current content: > > > > radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile > > santacruz > > chubut > > > > I've debug the program and the error occurs after proc0 do an > > MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat); > > from the remote process. > > > > I've done several test I'll mention: > > > > 1) Change the order on machinefile > > radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile > > chubut > > santacruz > > > > In that case, I get this error: > > [chubut:2194] *** An error occurred in MPI_Recv > > [chubut:2194] *** on communicator MPI_COMM_WORLD > > [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated > > [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > > and then > > -------------------------------------------------------------------------- > > mpirun has exited due to process rank 0 with PID 2194 on > > node chubut exiting without calling "finalize". This may > > have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here). > > -------------------------------------------------------------------------- > > > > 2) I've got the same error executing on host chubut intead of santacruz, > > 3) a simple mpi programs like MPI_Hello world are working fine, but I > > suppose that are very simple program. > > > > radic@santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile > > MPI_Hello > > Hola Mundo Hola Marce 1 > > Hola Mundo Hola Marce 0 > > Hola Mundo Hola Marce 2 > > > > > > This is the information you ask for tuntime problem. > > a) radic@santacruz:~$ mpirun -version > > mpirun (Open MPI) 1.4.1 > > b) i'm using ubuntu 10,04. I'm installing the packages using apt-get > > install, so, I don't have a config.log > > c) The ompi_info --all is on the file ompi_info.zip > > d) These are PATH and LD_LIBRARY_PATH > > radic@santacruz:~$ echo $PATH > > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games > > radic@santacruz:~$ echo $LD_LIBRARY_PATH > > > > > > Thank you very much. > > > > Marcela. > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > <scgcc><scompi><chgcc><chompi> -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/