How did you attempt to start your job, and what does your configure line look like?
Sent from my iPad On Mar 5, 2012, at 2:11 PM, bin Wang <bighead...@gmail.com> wrote: > Hello All, > > I'm trying to run the latest OpenMPI code on Jaguar. > (Cloned from the Open MPI Mercurial mirror of the Subversion repository) > The configuration and compilation of OpenMPI were fine, and benchmark > was also successfully compiled. I tried to launch my program using mpirun > within an interactive job, but it failed immediately. > > Core dump file gave me the following information. > ====================Error Msg========================= > [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a > daemon on the local > node in file ess_singleton_module.c at line 220 > -------------------------------------------------------------------------- > It looks like orte_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during orte_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > ompi_mpi_init: orte_init failed > --> Returned value Unable to start a daemon on the local node (-127) instead > of ORTE_SUCCESS > > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration33r environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > ompi_mpi_init: orte_init failed > --> Returned "Unable to start a daemon on40he local node" (-127) instead of > "Success" (0) > -------------------------------------------------------------------------- > [jaguarpf-login2:15370] *** An error occurred in MPI_Init > [jaguarpf-login2:15370] *** reported by process [4294967295,42949No process > In: Line: ?? PC: ?? > [jaguarpf-login2:15370] *** on a NULL communicator > [jaguarpf-login2:15370] *** Unknown error > [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this > communicator will now abort, > [jaguarpf-login2:15370] *** and potentially your MPI job) > -------------------------------------------------------------------------- > An MPI process is aborting at a time when it cannot guarantee that all > of its peer processes in the job will be killed properly. You should > double check that everything has shut down cleanly. > Reason: Before MPI_INIT completed > Local host: jaguarpf-login2 > PID: 15370 > -------------------------------------------------------------------------- > Program exited with code 01. > ====================Error Msg Over===================== > > There are several components under ess, but I don't know why and how the > singleton component was chosen. > > I hope someone could help me to compile and run openmpi successfully on > Jaguar. > > Any comment and suggestion will be appreciated. > > Thanks, > > --Bin > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users