Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Ralph Castain
On Feb 22, 2014, at 10:14 AM, Suraj Prabhakaran wrote: >> Yeah, we added those capabilities specifically for this purpose. Indeed, >> another researcher added this to Torque a couple of years ago, though it >> didn't get pushed upstream. Also was added to Slurm. > > Thanks for your help . By

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Suraj Prabhakaran
> Yeah, we added those capabilities specifically for this purpose. Indeed, > another researcher added this to Torque a couple of years ago, though it > didn't get pushed upstream. Also was added to Slurm. Thanks for your help . By any chance you have more info on that one? Or a faint idea where

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Ralph Castain
On Feb 22, 2014, at 9:30 AM, Suraj Prabhakaran wrote: > Thanks Ralph. > > I cannot get rid of Torque since I am actually working on dynamic allocation > of nodes for a running job on Torque. What I actually want to do is spawn > processes on the dynamically assigned nodes since that is the m

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Suraj Prabhakaran
Thanks Ralph. I cannot get rid of Torque since I am actually working on dynamic allocation of nodes for a running job on Torque. What I actually want to do is spawn processes on the dynamically assigned nodes since that is the most easiest way to expand MPI processes when a resource allocation

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Ralph Castain
On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran wrote: > Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children > never returned! Understood - my point was that the output shows no errors or issues. For some reason, the progress thread appears to just stop. This usually in

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children never returned! I configured MPI with ./configure --prefix=/dir/ --enable-debug --with-tm=/usr/local/ On Feb 22, 2014, at 12:53 AM, Ralph Castain wrote: > Strange - it all looks just fine. How was OMPI configured? >

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Ralph Castain
Strange - it all looks just fine. How was OMPI configured? On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran wrote: > Ok, I figured out that it was not a problem with the node grsacc04 because I > now conducted the same on totally different set of nodes. > > I must really say that with --bind-t

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Ok, I figured out that it was not a problem with the node grsacc04 because I now conducted the same on totally different set of nodes. I must really say that with --bind-to none option, the program completed "many" times compared to earlier but still "sometimes" it hangs! Attaching now the out

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Ralph Castain
Well, that all looks fine. However, I note that the procs on grsacc04 all stopped making progress at the same point, which is why the job hung. All the procs on the other nodes were just fine. So let's try a couple of things: 1. add "--bind-to none" to your cmd line so we avoid any contention i

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Right, so I have the output here. Same case, mpiexec -mca plm_base_verbose 5 -mca ess_base_verbose 5 -mca grpcomm_base_verbose 5 -np 3 ./simple_spawn Output attached. Best, Suraj output Description: Binary data On Feb 21, 2014, at 5:30 AM, Ralph Castain wrote: > > On Feb 20, 2014, at

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Ralph Castain
On Feb 20, 2014, at 7:05 PM, Suraj Prabhakaran wrote: > Thanks Ralph! > > I must have mentioned though. Without the Torque environment, spawning with > ssh works ok. But Under the torque environment, not. Ah, no - you forgot to mention that point. > > I started the simple_spawn with 3 pro

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Suraj Prabhakaran
Thanks Ralph! I must have mentioned though. Without the Torque environment, spawning with ssh works ok. But Under the torque environment, not. I started the simple_spawn with 3 processes and spawned 9 processes (3 per node on 3 nodes). There is no problem with the Torque environment because

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Ralph Castain
Hmmm...I don't see anything immediately glaring. What do you mean by "doesn't work"? Is there some specific behavior you see? You might try the attached program. It's a simple spawn test we use - 1.7.4 seems happy with it. simple_spawn.c Description: Binary data On Feb 20, 2014, at 10:14 AM

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Suraj Prabhakaran
I am using 1.7.4! On Feb 20, 2014, at 7:00 PM, Ralph Castain wrote: > What OMPI version are you using? > > On Feb 20, 2014, at 7:56 AM, Suraj Prabhakaran > wrote: > >> Hello! >> >> I am having problem using MPI_Comm_spawn under torque. It doesn't work when >> spawning more than 12 processe

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Ralph Castain
What OMPI version are you using? On Feb 20, 2014, at 7:56 AM, Suraj Prabhakaran wrote: > Hello! > > I am having problem using MPI_Comm_spawn under torque. It doesn't work when > spawning more than 12 processes on various nodes. To be more precise, > "sometimes" it works, and "sometimes" it d

[OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Suraj Prabhakaran
Hello! I am having problem using MPI_Comm_spawn under torque. It doesn't work when spawning more than 12 processes on various nodes. To be more precise, "sometimes" it works, and "sometimes" it doesn't! Here is my case. I obtain 5 nodes, 3 cores per node and my $PBS_NODEFILE looks like below.