Reuti, Thanks for your comments,
In our case, we are currently running different mpirun commands on clusters sharing the same frontend. Basically we use a wrapper to run the mpirun command and to run an ompi-clean command to clean up the mpi job if required. Using ompi-clean like this just kills all other mpi jobs running on same frontend. I cannot use queuing system as you have suggested this is why I was wondering a option or other solution associated to ompi-clean command to avoid this general mpi jobs cleaning. Cheers Nicolas 2012/10/24, Reuti <re...@staff.uni-marburg.de>: > Hi, > > Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere: > >> I am having issue running ompi-clean which clean up (this is normal) >> session associated to a user which means it kills all running jobs >> assoicated to this session (this is also normal). But I would like to be >> able to clean up session associated to a job (a not user). >> >> Here is my point: >> >> I am running two executable : >> >> % mpirun -np 2 myexec1 >> --> run with PID 2399 ... >> % mpirun -np 2 myexec2 >> --> run with PID 2402 ... >> >> When I run orte-clean I got this result : >> % orte-clean -v >> orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 >> orte-clean: killing any lingering procs >> orte-clean: found potential rogue orterun process >> (pid=2399,user=ndelader), sending SIGKILL... >> orte-clean: found potential rogue orterun process >> (pid=2402,user=ndelader), sending SIGKILL... >> >> Which means that both jobs have been killed :-( >> Basically I would like to perform orte-clean using executable name or PID >> or whatever that identify which job I want to stop an clean. It seems I >> would need to create an openmpi session per job. Does it make sense ? And >> I would like to be able to do something like following command and get >> following result : >> >> % orte-clean -v myexec1 >> orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0 >> orte-clean: killing any lingering procs >> orte-clean: found potential rogue orterun process >> (pid=2399,user=ndelader), sending SIGKILL... >> >> >> Does it make sense ? Is there a way to perform this kind of selection in >> cleaning process ? > > How many jobs are you starting on how many nodes at one time? This > requirement could be a point to start to use a queuing system, where can > remove job individually and also serialize your workflow. In fact: we use > GridEngine also local on workstations for this purpose. > > -- Reuti > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >