Or perhaps cloned, renamed to orte-kill, and modified to kill a single (or multiple) specific job(s). That would be POSIX-like ("kill" vs. "clean").
On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote: > And just to give a little context, ompi-clean was created initially to > "clean" up a node, not for cleaning up a specific job. It was for the case > where MPI jobs would leave some files behind or leave some processes running. > (I do not believe this happens much at all anymore.) But, as was said, no > reason it could not be modified. > >> -----Original Message----- >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >> On Behalf Of Jeff Squyres >> Sent: Wednesday, October 24, 2012 12:56 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] ompi-clean on single executable >> >> ...but patches would be greatly appreciated. :-) >> >> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: >> >>> All things are possible, including what you describe. Not sure when we >> would get to it, though. >>> >>> >>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere >> <nicolas.deladerri...@gmail.com> wrote: >>> >>>> Reuti, >>>> >>>> The problem I am facing is a small small part of our production >>>> system, and I cannot modify our mpirun submission system. This is why >>>> i am looking at solution using only ompi-clean of mpirun command >>>> specification. >>>> >>>> Thanks, >>>> Nicolas >>>> >>>> 2012/10/24, Reuti <re...@staff.uni-marburg.de>: >>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: >>>>> >>>>>> Reuti, >>>>>> >>>>>> Thanks for your comments, >>>>>> >>>>>> In our case, we are currently running different mpirun commands on >>>>>> clusters sharing the same frontend. Basically we use a wrapper to >>>>>> run the mpirun command and to run an ompi-clean command to clean >> up >>>>>> the mpi job if required. >>>>>> Using ompi-clean like this just kills all other mpi jobs running on >>>>>> same frontend. I cannot use queuing system >>>>> >>>>> Why? Using it on a single machine was only one possible setup. Its >>>>> purpose is to distribute jobs to slave hosts. If you have already >>>>> one frontend as login-machine it fits perfect: the qmaster (in case >>>>> of SGE) can run there and the execd on the nodes. >>>>> >>>>> -- Reuti >>>>> >>>>> >>>>>> as you have suggested this >>>>>> is why I was wondering a option or other solution associated to >>>>>> ompi-clean command to avoid this general mpi jobs cleaning. >>>>>> >>>>>> Cheers >>>>>> Nicolas >>>>>> >>>>>> 2012/10/24, Reuti <re...@staff.uni-marburg.de>: >>>>>>> Hi, >>>>>>> >>>>>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: >>>>>>> >>>>>>>> I am having issue running ompi-clean which clean up (this is >>>>>>>> normal) session associated to a user which means it kills all >>>>>>>> running jobs assoicated to this session (this is also normal). >>>>>>>> But I would like to be able to clean up session associated to a >>>>>>>> job (a not user). >>>>>>>> >>>>>>>> Here is my point: >>>>>>>> >>>>>>>> I am running two executable : >>>>>>>> >>>>>>>> % mpirun -np 2 myexec1 >>>>>>>> --> run with PID 2399 ... >>>>>>>> % mpirun -np 2 myexec2 >>>>>>>> --> run with PID 2402 ... >>>>>>>> >>>>>>>> When I run orte-clean I got this result : >>>>>>>> % orte-clean -v >>>>>>>> orte-clean: cleaning session dir tree >>>>>>>> openmpi-sessions-ndelader@myhost_0 >>>>>>>> orte-clean: killing any lingering procs >>>>>>>> orte-clean: found potential rogue orterun process >>>>>>>> (pid=2399,user=ndelader), sending SIGKILL... >>>>>>>> orte-clean: found potential rogue orterun process >>>>>>>> (pid=2402,user=ndelader), sending SIGKILL... >>>>>>>> >>>>>>>> Which means that both jobs have been killed :-( Basically I would >>>>>>>> like to perform orte-clean using executable name or PID or >>>>>>>> whatever that identify which job I want to stop an clean. It >>>>>>>> seems I would need to create an openmpi session per job. Does it >> make sense ? >>>>>>>> And >>>>>>>> I would like to be able to do something like following command >>>>>>>> and get following result : >>>>>>>> >>>>>>>> % orte-clean -v myexec1 >>>>>>>> orte-clean: cleaning session dir tree >>>>>>>> openmpi-sessions-ndelader@myhost_0 >>>>>>>> orte-clean: killing any lingering procs >>>>>>>> orte-clean: found potential rogue orterun process >>>>>>>> (pid=2399,user=ndelader), sending SIGKILL... >>>>>>>> >>>>>>>> >>>>>>>> Does it make sense ? Is there a way to perform this kind of >>>>>>>> selection in cleaning process ? >>>>>>> >>>>>>> How many jobs are you starting on how many nodes at one time? This >>>>>>> requirement could be a point to start to use a queuing system, >>>>>>> where can remove job individually and also serialize your >>>>>>> workflow. In fact: we use GridEngine also local on workstations >>>>>>> for this purpose. >>>>>>> >>>>>>> -- Reuti >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > ----------------------------------------------------------------------------------- > This email message is for the sole use of the intended recipient(s) and may > contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > ----------------------------------------------------------------------------------- > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/