On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:

>>> 
>>> I am not using that computer. A scenario that I have come across is
>>> that when a msub job is killed because it has exceeded it's Walltime
>>> mpi tasks spawned by ssh may not be terminated because (so I am told)
>>> Torque does not know about them.
>> 
>> Not true with OMPI. Torque will kill mpirun, which will in turn cause all 
>> MPI procs to die. Yes, it's true that Torque won't know about the MPI procs 
>> itself. However, OMPI is designed such that termination of mpirun by the 
>> resource manager will cause all apps to die.
> 
> How does Torque on NodeA know that an mpi launched on NodeB by ssh
> should be killed?

Torque works at the job level. So if you get an interactive Torque session, 
Torque can only kill your session - which means it automatically kills 
everything started within that session, regardless of where it resides.

Perhaps you don't fully understand how Torque works? As a brief recap, Torque 
allocates the requested number of nodes. On one of the nodes, it starts a 
"sister mom" that is responsible for that job. It also wires Torque daemons on 
each of the other nodes to the "sister mom" to create, in effect, a virtual 
machine.

When the Torque session is completed, the "sister mom" notifies all the other 
Torque daemons in the VM that the session shall be terminated. At that time, 
all local procs belonging to that session are terminated. It doesn't matter how 
those procs got there - by ssh, mpirun, whatever. They -all- are killed.

What Torque cannot do is kill the actual mpi processes started by mpirun. See 
below.

> OMPI is designed (from what I can see) for all
> mpirun to be started from the same node, not distributed mpi launched
> independently from multiple nodes.

Remember, mpirun launches its own set of daemons on each node. Each daemon then 
locally spawns its set of mpi processes. So mpirun knows where everything is 
and can kill it.

To further ensure cleanup, each daemon monitors mpirun's existence. So Torque 
only knows about mpirun, and Torque kills mpirun when (e.g.) walltime is 
reached. OMPI's daemons see that mpirun has died and terminate their local 
processes prior to terminating themselves.

Torque cannot directly kill the mpi processes because it has no knowledge of 
their existence and relationship to the job session. Instead, since Torque 
knows about the ssh that started mpirun (since you executed it interactively), 
it kills the ssh - which causes mpirun to die, which then causes the mpi apps 
to die.


> I am not certain that killing the
> ssh on NodeA will in fact terminate a mpi launched on NodeB (i.e. by
> ssh NodeB mpirun AAA...) with OMPI.
> 

It most certainly will! That mpirun on nodeB is executing under the ssh from 
nodeA, so when that ssh session is killed, it automatically kills everything 
run underneath it. And when mpirun dies, so does the job it was running, as per 
above.

You can prove this to yourself rather easily. Just ssh to a remote node and 
execute any command that lingers for awhile - say something simple like 
"sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee that 
the command will have died.


Reply via email to