Or perhaps cloned, renamed to orte-kill, and modified to kill a single (or 
multiple) specific job(s).  That would be POSIX-like ("kill" vs. "clean").


On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote:

> And just to give a little context, ompi-clean was created initially to 
> "clean" up a node, not for cleaning up a specific job.  It was for the case 
> where MPI jobs would leave some files behind or leave some processes running. 
>  (I do not believe this happens much at all anymore.)  But, as was said, no 
> reason it could not be modified.
> 
>> -----Original Message-----
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
>> On Behalf Of Jeff Squyres
>> Sent: Wednesday, October 24, 2012 12:56 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] ompi-clean on single executable
>> 
>> ...but patches would be greatly appreciated.  :-)
>> 
>> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote:
>> 
>>> All things are possible, including what you describe. Not sure when we
>> would get to it, though.
>>> 
>>> 
>>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere
>> <nicolas.deladerri...@gmail.com> wrote:
>>> 
>>>> Reuti,
>>>> 
>>>> The problem I am facing is a small small part of our production
>>>> system, and I cannot modify our mpirun submission system. This is why
>>>> i am looking at solution using only ompi-clean of mpirun command
>>>> specification.
>>>> 
>>>> Thanks,
>>>> Nicolas
>>>> 
>>>> 2012/10/24, Reuti <re...@staff.uni-marburg.de>:
>>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere:
>>>>> 
>>>>>> Reuti,
>>>>>> 
>>>>>> Thanks for your comments,
>>>>>> 
>>>>>> In our case, we are currently running different mpirun commands on
>>>>>> clusters sharing the same frontend. Basically we use a wrapper to
>>>>>> run the mpirun command and to run an ompi-clean command to clean
>> up
>>>>>> the mpi job if required.
>>>>>> Using ompi-clean like this just kills all other mpi jobs running on
>>>>>> same frontend. I cannot use queuing system
>>>>> 
>>>>> Why? Using it on a single machine was only one possible setup. Its
>>>>> purpose is to distribute jobs to slave hosts. If you have already
>>>>> one frontend as login-machine it fits perfect: the qmaster (in case
>>>>> of SGE) can run there and the execd on the nodes.
>>>>> 
>>>>> -- Reuti
>>>>> 
>>>>> 
>>>>>> as you have suggested this
>>>>>> is why I was wondering a option or other solution associated to
>>>>>> ompi-clean command to avoid this general mpi jobs cleaning.
>>>>>> 
>>>>>> Cheers
>>>>>> Nicolas
>>>>>> 
>>>>>> 2012/10/24, Reuti <re...@staff.uni-marburg.de>:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere:
>>>>>>> 
>>>>>>>> I am having issue running ompi-clean which clean up (this is
>>>>>>>> normal) session associated to a user which means it kills all
>>>>>>>> running jobs assoicated to this session (this is also normal).
>>>>>>>> But I would like to be able to clean up session associated to a
>>>>>>>> job (a not user).
>>>>>>>> 
>>>>>>>> Here is my point:
>>>>>>>> 
>>>>>>>> I am running two executable :
>>>>>>>> 
>>>>>>>> % mpirun -np 2 myexec1
>>>>>>>>    --> run with PID 2399 ...
>>>>>>>> % mpirun -np 2 myexec2
>>>>>>>>    --> run with PID 2402 ...
>>>>>>>> 
>>>>>>>> When I run orte-clean I got this result :
>>>>>>>> % orte-clean -v
>>>>>>>> orte-clean: cleaning session dir tree
>>>>>>>> openmpi-sessions-ndelader@myhost_0
>>>>>>>> orte-clean: killing any lingering procs
>>>>>>>> orte-clean: found potential rogue orterun process
>>>>>>>> (pid=2399,user=ndelader), sending SIGKILL...
>>>>>>>> orte-clean: found potential rogue orterun process
>>>>>>>> (pid=2402,user=ndelader), sending SIGKILL...
>>>>>>>> 
>>>>>>>> Which means that both jobs have been killed :-( Basically I would
>>>>>>>> like to perform orte-clean using executable name or PID or
>>>>>>>> whatever that identify which job I want to stop an clean. It
>>>>>>>> seems I would need to create an openmpi session per job. Does it
>> make sense ?
>>>>>>>> And
>>>>>>>> I would like to be able to do something like following command
>>>>>>>> and get following result :
>>>>>>>> 
>>>>>>>> % orte-clean -v myexec1
>>>>>>>> orte-clean: cleaning session dir tree
>>>>>>>> openmpi-sessions-ndelader@myhost_0
>>>>>>>> orte-clean: killing any lingering procs
>>>>>>>> orte-clean: found potential rogue orterun process
>>>>>>>> (pid=2399,user=ndelader), sending SIGKILL...
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Does it make sense ? Is there a way to perform this kind of
>>>>>>>> selection in cleaning process ?
>>>>>>> 
>>>>>>> How many jobs are you starting on how many nodes at one time? This
>>>>>>> requirement could be a point to start to use a queuing system,
>>>>>>> where can remove job individually and also serialize your
>>>>>>> workflow. In fact: we use GridEngine also local on workstations
>>>>>>> for this purpose.
>>>>>>> 
>>>>>>> -- Reuti
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to