Hi,

Am 07.08.2014 um 21:14 schrieb Joseph Farran:

> I am using Son of Grid Engine 8.1.6.
> 
> We have an issue that occurs once in a while in which Grid Engine will 
> suspend a job ( subordinate queue ) and while Grid Engine thinks the job is 
> suspended ( qstat shows "S" for job state ), the process on the node keeps 
> running and not really suspended.
> 
> If I manually suspend the job ( qmod -sj <job-id> ), then the process 
> suspends just fine on the node and I see the "Ss" in qstat listing.
> 
> Is there a way to tell Grid Engine to re-issue a suspend signal to processes 
> on a node that are supposed to be suspended?
> 
> I can manually tell GE to suspend a job ( qmod -sj ) but then I have to also 
> manually un-suspend it.    So what I am looking for is to have GE re-issue 
> suspend signals for jobs it believes are already suspended.

to investigate this: what about setting up a custom "suspend_method" and log 
whether it's called at all and send the sigstop to the complete process group 
on your own to mimic the original behavior:

$ qconf -sq baz
...
suspend_method /foo/bar/mysuspend.sh $job_pid

And the script:

#!/bin/sh
echo "suspend script called at: $(date)" >> /tmp/suspend.log
kill -stop -- -$1


-- Reuti

> Thanks,
> Joseph
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to