Hi, Am 07.08.2014 um 21:14 schrieb Joseph Farran:
> I am using Son of Grid Engine 8.1.6. > > We have an issue that occurs once in a while in which Grid Engine will > suspend a job ( subordinate queue ) and while Grid Engine thinks the job is > suspended ( qstat shows "S" for job state ), the process on the node keeps > running and not really suspended. > > If I manually suspend the job ( qmod -sj <job-id> ), then the process > suspends just fine on the node and I see the "Ss" in qstat listing. > > Is there a way to tell Grid Engine to re-issue a suspend signal to processes > on a node that are supposed to be suspended? > > I can manually tell GE to suspend a job ( qmod -sj ) but then I have to also > manually un-suspend it. So what I am looking for is to have GE re-issue > suspend signals for jobs it believes are already suspended. to investigate this: what about setting up a custom "suspend_method" and log whether it's called at all and send the sigstop to the complete process group on your own to mimic the original behavior: $ qconf -sq baz ... suspend_method /foo/bar/mysuspend.sh $job_pid And the script: #!/bin/sh echo "suspend script called at: $(date)" >> /tmp/suspend.log kill -stop -- -$1 -- Reuti > Thanks, > Joseph > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
