Reuti,

Thanks for your comments,

In our case, we are currently running different mpirun commands on
clusters sharing the same frontend. Basically we use a wrapper to run
the mpirun command and to run an ompi-clean command to clean up the
mpi job if required.
Using ompi-clean like this just kills all other mpi jobs running on
same frontend. I cannot use queuing system as you have suggested this
is why I was wondering a option or other solution associated to
ompi-clean command to avoid this general mpi jobs cleaning.

Cheers
Nicolas

2012/10/24, Reuti <re...@staff.uni-marburg.de>:
> Hi,
>
> Am 24.10.2012 um 09:36 schrieb Nicolas Deladerriere:
>
>> I am having issue running ompi-clean which clean up (this is normal)
>> session associated to a user which means it kills all running jobs
>> assoicated to this session (this is also normal). But I would like to be
>> able to clean up session associated to a job (a not user).
>>
>> Here is my point:
>>
>> I am running two executable :
>>
>>  % mpirun -np 2 myexec1
>>        --> run with PID 2399 ...
>>  % mpirun -np 2 myexec2
>>        --> run with PID 2402 ...
>>
>> When I run orte-clean I got this result :
>>  % orte-clean -v
>>  orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0
>>  orte-clean: killing any lingering procs
>>  orte-clean: found potential rogue orterun process
>> (pid=2399,user=ndelader), sending SIGKILL...
>>  orte-clean: found potential rogue orterun process
>> (pid=2402,user=ndelader), sending SIGKILL...
>>
>> Which means that both jobs have been killed :-(
>> Basically I would like to perform orte-clean using executable name or PID
>> or whatever that identify which job I want to stop an clean. It seems I
>> would need to create an openmpi session per job. Does it make sense ? And
>> I would like to be able to do something like following command and get
>> following result :
>>
>>   % orte-clean -v myexec1
>>  orte-clean: cleaning session dir tree openmpi-sessions-ndelader@myhost_0
>>  orte-clean: killing any lingering procs
>>  orte-clean: found potential rogue orterun process
>> (pid=2399,user=ndelader), sending SIGKILL...
>>
>>
>> Does it make sense ? Is there a way to perform this kind of selection in
>> cleaning process ?
>
> How many jobs are you starting on how many nodes at one time? This
> requirement could be a point to start to use a queuing system, where can
> remove job individually and also serialize your workflow. In fact: we use
> GridEngine also local on workstations for this purpose.
>
> -- Reuti
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to