Am 26.11.2014 um 08:23 schrieb Guillermo Marco Puche: > On 26/11/14 00:42, Reuti wrote: >> Hi, >> >> Am 25.11.2014 um 23:28 schrieb Guillermo Marco Puche: >> >>> I'm experiencing a very weird issue. I've no idea how to deal with it. >>> • I've submited multiple jobs ie: job1, job2, job3. >>> • Jobs are running in multiple compute nodes >>> • I've modified jobs to user hold and then rescheduled >>> • Jobs are now in a hqR state in SGE job pool (they're supposed to stay >>> there and free their slots and resources in their respective compute nodes) >>> • Compute nodes that previously ran this jobs continue to execute the >>> job process and consuming resources (I can see them with htop inside >>> compute node) >> But they are gone from `qstat` and not listed twice? > Nope, they're listed once in qstat. >> >> >>> So what's the correct way to pause/restart a job and hold it on SGE pool >>> without holding resources? >> Are these processes still bound to the execd and the shepherd of SGE or did >> they jump out of the process tree compared to the time when they were >> running initially? > Yest processes still bound to the execd and the shepherd of SGE.
Which version of SGE are you using? After issuing `qmod -rj <jobid>` they should be gone of course. -- Reuti >> >> Do you use any `trap` inside the job script? > No trap commands. >> >> -- Reuti > Regards, > Guillermo. > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
