How did you attempt to start your job, and what does your configure line look 
like?

Sent from my iPad

On Mar 5, 2012, at 2:11 PM, bin Wang <bighead...@gmail.com> wrote:

> Hello All,
> 
> I'm trying to run the latest OpenMPI code on Jaguar.
> (Cloned from the Open MPI Mercurial mirror of the Subversion repository)
> The configuration and compilation of OpenMPI were fine, and benchmark
> was also successfully compiled. I tried to launch my program using mpirun
> within an interactive job, but it failed immediately.
> 
> Core dump file gave me the following information.
> ====================Error Msg=========================
> [jaguarpf-login2:15370] [[INVALID],INVALID] ORTE_ERROR_LOG: Unable to start a 
> daemon on the local
> node in file ess_singleton_module.c at line 220
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> ompi_mpi_init: orte_init failed
> --> Returned value Unable to start a daemon on the local node (-127) instead 
> of ORTE_SUCCESS
> 
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration33r environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> ompi_mpi_init: orte_init failed
> --> Returned "Unable to start a daemon on40he local node" (-127) instead of 
> "Success" (0)
> --------------------------------------------------------------------------
> [jaguarpf-login2:15370] *** An error occurred in MPI_Init
> [jaguarpf-login2:15370] *** reported by process [4294967295,42949No process 
> In: Line: ??   PC: ??
> [jaguarpf-login2:15370] *** on a NULL communicator
> [jaguarpf-login2:15370] *** Unknown error
> [jaguarpf-login2:15370] *** MPI_ERRORS_ARE_FATAL (processes in this 
> communicator will now abort,
> [jaguarpf-login2:15370] *** and potentially your MPI job)
> --------------------------------------------------------------------------
> An MPI process is aborting at a time when it cannot guarantee that all
> of its peer processes in the job will be killed properly.  You should
> double check that everything has shut down cleanly.
> Reason:     Before MPI_INIT completed
> Local host: jaguarpf-login2
> PID:        15370
> --------------------------------------------------------------------------
> Program exited with code 01.
> ====================Error Msg Over=====================
> 
> There are several components under ess, but I don't know why and how the
> singleton component was chosen.
> 
> I hope someone could help me to compile and run openmpi successfully on 
> Jaguar.
> 
> Any comment and suggestion will be appreciated.
> 
> Thanks,
> 
> --Bin
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to