Excuse me. I forgot the attaching. 2011/2/11 Marcela Castro León <mcast...@gmail.com>
> Hello: > > I've the same version ob Ubuntu 10.04. The original version was Ubuntu > Server 9.1 (64) and upgraded both of them to 10.04. > Yesterday I've updated and upgraded to the same level again. But I've got > the same error after that. > The machine are exactly the same, HP Compaq with inter Core I5. > > Anyway I've compared the version of openmpi and gcc, and are the same too: > 1.4.1-2 and 4.4.4.3 respectly. I'm attaching the exit of the dpkg-l on the > two system. > > I would appreciate a lot any help to solve it. > Thank you. > > Marcela. > 2011/2/10 Jeff Squyres <jsquy...@cisco.com> > > I typically see these kinds of errors when there's an Open MPI version >> mismatch between the nodes, and/or if there are slightly different flavors >> of Linux installed on each node (i.e., you're technically in a heterogeneous >> situation, but you're trying to run a single application binary). Can you >> verify: >> >> 1. that you have exactly the same version of Open MPI installed on all >> nodes? (and that your application was compiled against that exact version) >> >> 2. that you have exactly the same OS/update level installed on all nodes >> (e.g., same versions of glibc, etc.) >> >> >> On Feb 10, 2011, at 3:13 AM, Marcela Castro León wrote: >> >> > Hello >> > I've a program that allways works fine, but i'm trying it on a new >> cluster and fails when I execute it on more than one machine. >> > I mean, if I execute alone on each host, everything works fine. >> > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 ../test parcorto.txt >> > >> > But when I execute >> > radic@santacruz:~/gaps/caso3-i1$ mpirun -np 3 -machinefile >> /home/radic/mfile ../test parcorto.txt >> > >> > I get this error: >> > >> > mpirun has exited due to process rank 0 with PID 2132 on >> > node santacruz exiting without calling "finalize". This may >> > have caused other processes in the application to be >> > terminated by signals sent by mpirun (as reported here). >> > >> -------------------------------------------------------------------------- >> > >> > Though the machinefile (mfile) had only one machine, the programs fails. >> > This is the current content: >> > >> > radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile >> > santacruz >> > chubut >> > >> > I've debug the program and the error occurs after proc0 do an >> > MPI_Recv(&nomproc,lennomproc,MPI_CHAR,i,tag,MPI_COMM_WORLD,&Stat); >> > from the remote process. >> > >> > I've done several test I'll mention: >> > >> > 1) Change the order on machinefile >> > radic@santacruz:~/gaps/caso3-i1$ cat /home/radic/mfile >> > chubut >> > santacruz >> > >> > In that case, I get this error: >> > [chubut:2194] *** An error occurred in MPI_Recv >> > [chubut:2194] *** on communicator MPI_COMM_WORLD >> > [chubut:2194] *** MPI_ERR_TRUNCATE: message truncated >> > [chubut:2194] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) >> > and then >> > >> -------------------------------------------------------------------------- >> > mpirun has exited due to process rank 0 with PID 2194 on >> > node chubut exiting without calling "finalize". This may >> > have caused other processes in the application to be >> > terminated by signals sent by mpirun (as reported here). >> > >> -------------------------------------------------------------------------- >> > >> > 2) I've got the same error executing on host chubut intead of santacruz, >> > 3) a simple mpi programs like MPI_Hello world are working fine, but I >> suppose that are very simple program. >> > >> > radic@santacruz:~/gaps$ mpirun -np 3 -machinefile /home/radic/mfile >> MPI_Hello >> > Hola Mundo Hola Marce 1 >> > Hola Mundo Hola Marce 0 >> > Hola Mundo Hola Marce 2 >> > >> > >> > This is the information you ask for tuntime problem. >> > a) radic@santacruz:~$ mpirun -version >> > mpirun (Open MPI) 1.4.1 >> > b) i'm using ubuntu 10,04. I'm installing the packages using apt-get >> install, so, I don't have a config.log >> > c) The ompi_info --all is on the file ompi_info.zip >> > d) These are PATH and LD_LIBRARY_PATH >> > radic@santacruz:~$ echo $PATH >> > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games >> > radic@santacruz:~$ echo $LD_LIBRARY_PATH >> > >> > >> > Thank you very much. >> > >> > Marcela. >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >
scgcc
Description: Binary data
scompi
Description: Binary data
chgcc
Description: Binary data
chompi
Description: Binary data