Thanks all for your comments Ralph
What I was initially looking at is a tool (or option of orte-clean) that clean up the mess you are talking about, but only the mess that have been created by a single mpirun command. As far I have understood, orte-clean clean all mess on a node associated to all open-mpi process that have run (or are currently running). According to Rolph comment, usually, mpirun command does not leave any zombie processes, Hence it seems that the effect of orte-clean is limited. But, since it exists, I was wondering that it is doing usefull stuff ? Cheers, Nicolas 2012/10/25 Ralph Castain <r...@open-mpi.org> > Okay, now I'm confused. If all you want to do is cleanly "kill" a running > OMPI job, then why not just issue > > $ kill SIGTERM <pid-for-that-mpirun> > > This will cause mpirun to order the clean termination of all remote procs > within that execution, and then cleanly terminate itself. No tool we create > could do it any better. > > Is there an issue with doing so? > > orte-clean was intended to cleanup the mess if/when the above method > doesn't work - i.e., when you have to "kill SIGKILL mpirun", which forcibly > kills mpirun but might leave zombie orteds on the remote nodes. > > > On Oct 24, 2012, at 10:39 AM, Jeff Squyres <jsquy...@cisco.com> wrote: > > > Or perhaps cloned, renamed to orte-kill, and modified to kill a single > (or multiple) specific job(s). That would be POSIX-like ("kill" vs. > "clean"). > > > > > > On Oct 24, 2012, at 1:32 PM, Rolf vandeVaart wrote: > > > >> And just to give a little context, ompi-clean was created initially to > "clean" up a node, not for cleaning up a specific job. It was for the case > where MPI jobs would leave some files behind or leave some processes > running. (I do not believe this happens much at all anymore.) But, as was > said, no reason it could not be modified. > >> > >>> -----Original Message----- > >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > >>> On Behalf Of Jeff Squyres > >>> Sent: Wednesday, October 24, 2012 12:56 PM > >>> To: Open MPI Users > >>> Subject: Re: [OMPI users] ompi-clean on single executable > >>> > >>> ...but patches would be greatly appreciated. :-) > >>> > >>> On Oct 24, 2012, at 12:24 PM, Ralph Castain wrote: > >>> > >>>> All things are possible, including what you describe. Not sure when we > >>> would get to it, though. > >>>> > >>>> > >>>> On Oct 24, 2012, at 4:01 AM, Nicolas Deladerriere > >>> <nicolas.deladerri...@gmail.com> wrote: > >>>> > >>>>> Reuti, > >>>>> > >>>>> The problem I am facing is a small small part of our production > >>>>> system, and I cannot modify our mpirun submission system. This is why > >>>>> i am looking at solution using only ompi-clean of mpirun command > >>>>> specification. > >>>>> > >>>>> Thanks, > >>>>> Nicolas > >>>>> > >>>>> 2012/10/24, Reuti <re...@staff.uni-marburg.de>: > >>>>>> Am 24.10.2012 um 11:33 schrieb Nicolas Deladerriere: > >>>>>> > >>>>>>> Reuti, > >>>>>>> > >>>>>>> Thanks for your comments, > >>>>>>> > >>>>>>> In our case, we are currently running different mpirun commands on > >>>>>>> clusters sharing the same frontend. Basically we use a wrapper to > >>>>>>> run the mpirun command and to run an ompi-clean command to clean > >>> up > >>>>>>> the mpi job if required. > >>>>>>> Using ompi-clean like this just kills all other mpi jobs running on > >>>>>>> same frontend. I cannot use queuing system > >>>>>> > >>>>>> Why? Using it on a single machine was only one possible setup. Its > >>>>>> purpose is to distribute jobs to slave hosts. If you have already > >>>>>> one frontend as login-machine it fits perfect: the qmaster (in case > >>>>>> of SGE) can run there and the execd on the nodes. > >>>>>> > >>>>>> -- Reuti > >>>>>> > >>>>>> > >>>>>>> as you have suggested this > >>>>>>> is why I was wondering a option or other solution associated to > >>>>>>> ompi-clean command to avoid this general mpi jobs cleaning. > >>>>>>> > >>>>>>> Cheers > >>>>>>> Nicolas > >>>>>>> > >>>>>>> 2012/10/24, Reuti <re...@staff.uni-marburg.de>: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: > >>>>>>>> > >>>>>>>>> I am having issue running ompi-clean which clean up (this is > >>>>>>>>> normal) session associated to a user which means it kills all > >>>>>>>>> running jobs assoicated to this session (this is also normal). > >>>>>>>>> But I would like to be able to clean up session associated to a > >>>>>>>>> job (a not user). > >>>>>>>>> > >>>>>>>>> Here is my point: > >>>>>>>>> > >>>>>>>>> I am running two executable : > >>>>>>>>> > >>>>>>>>> % mpirun -np 2 myexec1 > >>>>>>>>> --> run with PID 2399 ... > >>>>>>>>> % mpirun -np 2 myexec2 > >>>>>>>>> --> run with PID 2402 ... > >>>>>>>>> > >>>>>>>>> When I run orte-clean I got this result : > >>>>>>>>> % orte-clean -v > >>>>>>>>> orte-clean: cleaning session dir tree > >>>>>>>>> openmpi-sessions-ndelader@myhost_0 > >>>>>>>>> orte-clean: killing any lingering procs > >>>>>>>>> orte-clean: found potential rogue orterun process > >>>>>>>>> (pid=2399,user=ndelader), sending SIGKILL... > >>>>>>>>> orte-clean: found potential rogue orterun process > >>>>>>>>> (pid=2402,user=ndelader), sending SIGKILL... > >>>>>>>>> > >>>>>>>>> Which means that both jobs have been killed :-( Basically I would > >>>>>>>>> like to perform orte-clean using executable name or PID or > >>>>>>>>> whatever that identify which job I want to stop an clean. It > >>>>>>>>> seems I would need to create an openmpi session per job. Does it > >>> make sense ? > >>>>>>>>> And > >>>>>>>>> I would like to be able to do something like following command > >>>>>>>>> and get following result : > >>>>>>>>> > >>>>>>>>> % orte-clean -v myexec1 > >>>>>>>>> orte-clean: cleaning session dir tree > >>>>>>>>> openmpi-sessions-ndelader@myhost_0 > >>>>>>>>> orte-clean: killing any lingering procs > >>>>>>>>> orte-clean: found potential rogue orterun process > >>>>>>>>> (pid=2399,user=ndelader), sending SIGKILL... > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Does it make sense ? Is there a way to perform this kind of > >>>>>>>>> selection in cleaning process ? > >>>>>>>> > >>>>>>>> How many jobs are you starting on how many nodes at one time? This > >>>>>>>> requirement could be a point to start to use a queuing system, > >>>>>>>> where can remove job individually and also serialize your > >>>>>>>> workflow. In fact: we use GridEngine also local on workstations > >>>>>>>> for this purpose. > >>>>>>>> > >>>>>>>> -- Reuti > >>>>>>>> _______________________________________________ > >>>>>>>> users mailing list > >>>>>>>> us...@open-mpi.org > >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> users mailing list > >>>>>>> us...@open-mpi.org > >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> users mailing list > >>>>>> us...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>>>> > >>>>> _______________________________________________ > >>>>> users mailing list > >>>>> us...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>>> > >>>> > >>>> _______________________________________________ > >>>> users mailing list > >>>> us...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> > >>> -- > >>> Jeff Squyres > >>> jsquy...@cisco.com > >>> For corporate legal information go to: > >>> http://www.cisco.com/web/about/doing_business/legal/cri/ > >>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > ----------------------------------------------------------------------------------- > >> This email message is for the sole use of the intended recipient(s) and > may contain > >> confidential information. Any unauthorized review, use, disclosure or > distribution > >> is prohibited. If you are not the intended recipient, please contact > the sender by > >> reply email and destroy all copies of the original message. > >> > ----------------------------------------------------------------------------------- > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >