Thanks Reuti.
I'll give that a try. Do I need to setup an un-suspend method / script as
well?
Joseph
On 8/7/2014 2:33 PM, Reuti wrote:
Hi,
Am 07.08.2014 um 21:14 schrieb Joseph Farran:
I am using Son of Grid Engine 8.1.6.
We have an issue that occurs once in a while in which Grid Engine will suspend a job (
subordinate queue ) and while Grid Engine thinks the job is suspended ( qstat shows
"S" for job state ), the process on the node keeps running and not really
suspended.
If I manually suspend the job ( qmod -sj <job-id> ), then the process suspends just fine
on the node and I see the "Ss" in qstat listing.
Is there a way to tell Grid Engine to re-issue a suspend signal to processes on
a node that are supposed to be suspended?
I can manually tell GE to suspend a job ( qmod -sj ) but then I have to also
manually un-suspend it. So what I am looking for is to have GE re-issue
suspend signals for jobs it believes are already suspended.
to investigate this: what about setting up a custom "suspend_method" and log
whether it's called at all and send the sigstop to the complete process group on your own
to mimic the original behavior:
$ qconf -sq baz
...
suspend_method /foo/bar/mysuspend.sh $job_pid
And the script:
#!/bin/sh
echo "suspend script called at: $(date)" >> /tmp/suspend.log
kill -stop -- -$1
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users