Am 26.11.2014 um 19:02 schrieb Reuti: > Am 26.11.2014 um 11:30 schrieb Guillermo Marco Puche: > >> On 26/11/14 11:17, Reuti wrote: >>> Am 26.11.2014 um 08:23 schrieb Guillermo Marco Puche: >>> >>>> On 26/11/14 00:42, Reuti wrote: >>>>> Hi, >>>>> >>>>> Am 25.11.2014 um 23:28 schrieb Guillermo Marco Puche: >>>>> >>>>>> I'm experiencing a very weird issue. I've no idea how to deal with it. >>>>>> • I've submited multiple jobs ie: job1, job2, job3. >>>>>> • Jobs are running in multiple compute nodes >>>>>> • I've modified jobs to user hold and then rescheduled >>>>>> • Jobs are now in a hqR state in SGE job pool (they're supposed to stay >>>>>> there and free their slots and resources in their respective compute >>>>>> nodes) >>>>>> • Compute nodes that previously ran this jobs continue to execute the >>>>>> job process and consuming resources (I can see them with htop inside >>>>>> compute node) >>>>> But they are gone from `qstat` and not listed twice? >>>> Nope, they're listed once in qstat. >>>>> >>>>>> So what's the correct way to pause/restart a job and hold it on SGE pool >>>>>> without holding resources? >>>>> Are these processes still bound to the execd and the shepherd of SGE or >>>>> did they jump out of the process tree compared to the time when they were >>>>> running initially? >>>> Yest processes still bound to the execd and the shepherd of SGE. >>> Which version of SGE are you using? After issuing `qmod -rj <jobid>` they >>> should be gone of course. >> GE 6.2u5 > > Can you please set the loglevel in SGE's configuration: > > $ qconf -sconf > ... > loglevel log_info > > and have a look at the messages file of the node. There should be an entry > like: > > $ less /var/spool/sge/mypc/messages > 11/26/2014 19:00:24| main|mypc|I|SIGNAL jid: 11772 jatask: 1 signal: KILL
BTW: Does an entry exits in `qacct -j <jobid>` for this rescheduled job? -- Reuti > -- Reuti > > >> Guillermo. >>> >>> -- Reuti >>> >>>>> Do you use any `trap` inside the job script? >>>> No trap commands. >>>>> -- Reuti >>>> Regards, >>>> Guillermo. >>>> > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
