Am 30.10.2012 um 20:02 schrieb Joseph Farran: > Hi Reuti. > > Yes, I had that already set: > > qconf -sconf|fgrep execd_params > execd_params ENABLE_ADDGRP_KILL=TRUE > > What is strange is that 1 out of 10 jobs or so do get killed just fine when > they go past the hard wall time clock. > > However, the majority of the jobs are not being killed when they go past > their wall time clock. > > How can I investigate this further?
ps -e f -o ruid,euid,rgid,egid,stat,command --cols=500 (f w/o -) and post the relevant lines of the application please. -- Reuti > > > On 10/30/2012 11:44 AM, Reuti wrote: >> Hi, >> >> Am 30.10.2012 um 19:31 schrieb Joseph Farran: >> >>> I google this issue but did not see much help on the subject. >>> >>> I have several queues with hard wall clock limits like this one: >>> >>> # qconf -sq queue | grep h_rt >>> h_rt 96:00:00 >>> >>> I am running Son of Grid engine 8.1.2 and many jobs run past the hard wall >>> clock limit and continue to run. >>> >>> Look at GE qmaster logs, I see dozens and dozens of these entries: >>> >>> 10/30/2012 11:23:10|schedu|hpc|W|job 13179.1 should have finished since >>> 42318s >> Maybe they jumped out of the process tree (usually jobs are killed by `kill >> -9 -- -pgrp`. You can kill them by their additional group id, which is >> attached to all started processes even if the executed something like >> `setsid`: >> >> $ qconf -sconf >> ... >> execd_params ENABLE_ADDGRP_KILL=TRUE >> >> If it's still not working, we have to investigate the process tree. >> >> HTH - Reuti >> >> >>> These entries correspond to the running jobs that should have ended 96 >>> hours ago, but they keep on running. >>> >>> Why is GE not killing these jobs correctly when they run past the 96 hour >>> limit but yet complains they should have ended? >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
