[OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Nikolay Piskun
Hi, I am trying to use Totalview with srun and hit interesting problem. Looks like if OMPI is started by "srun -mpi=ompi ", mpi job is hang in ompi_wait_for_debugger() subroutine. What happen, I think is ompi was compiled without ORTE_DISABLE_FULL_SUPPORT and as result rank 0 is waiting fo

Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Ralph Castain
If you srun a job, then there is no "mpirun" to provide a proc_table. So running a job directly via srun means you cannot run TV on it. On Feb 10, 2011, at 8:34 AM, Nikolay Piskun wrote: > >Hi, > I am trying to use Totalview with srun and hit interesting problem. Looks > like if OMPI is

Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Jeff Squyres
FWIW: HNP = head node process = mpirun. On Feb 10, 2011, at 7:46 AM, Ralph Castain wrote: > If you srun a job, then there is no "mpirun" to provide a proc_table. So > running a job directly via srun means you cannot run TV on it. > > > On Feb 10, 2011, at 8:34 AM, Nikolay Piskun wrote: > >>

Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Nikolay Piskun
Actually in SLURM 2.2.0 that I am using now, there is a support for parallel debugger and srun does provide needed info and fill proc_table and set up all debug variable correctly. The only problem that I see so far is the one that I described. Maybe the solution would be to check if job was s

Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Jeff Squyres
I think what Ralph was trying to say is that Open MPI doesn't (currently) support running parallel debuggers when only srun is used (and mpirun is not). We'd certainly be open to someone submitting a patch to enable this functionality, though! On Feb 10, 2011, at 8:02 AM, Nikolay Piskun wrote:

Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Ralph Castain
FWIW: there already is a flag in ORTE that gets set when procs are launched by a non-orterun entity: orte_standalone_operation. So all you would have to do is add an appropriate check for that flag to be true. On Feb 10, 2011, at 9:18 AM, Jeff Squyres wrote: > I think what Ralph was trying to

Re: [OMPI devel] Debugger problem with srun and openmpi 1.5 (hang in OMPI)

2011-02-10 Thread Nikolay Piskun
Thanks much, looks like this should work. The patch is one line: -- diff -c ompi_debuggers.c ompi_debuggers.c.old *** ompi_debuggers.cThu Feb 10 15:13:07 2011 --- ompi_debuggers.c.oldFri Jan 22 09:21:23 2010 ***