Am 18.12.2014 um 22:21 schrieb [email protected]: > > We've got a job that was suspended via: > > qmod -sj $jobid > > that's continuing to run. The job consists of a BASH script, which in > turn submits other jobs in a loop, sleeping for 30 seconds after each loop. > > When I examine the job status on the node where it is executing via: > ps -e f | grep $JOBID > > I see that the process is sleeping (state "S"), which is not unexpected, > given the 'sleep 30' in the loop, but not suspended (state "T"): > > 30559 ? SNs 0:02 | \_ /bin/bash > /var/tmp/gridengine/8.1.6/default/spool/node-5-2/job_scripts/2367998
Maybe it was introduced in this edition, as in 6.2u5 it's working for me. Do you have a chance to test any other version on another machine with your application in question? -- Reuti > Indeed, the job is not suspended, as it keeps performing the action > inside the loop. > > The problem can be consistently reproduced with a trivial job, such as: > > ------------------------ > #! /bin/bash > i=0 > while [ $i -le 100 ] > do > date > i=$((i + 1)) > sleep 30 > done > ------------------------ > > Submitting that job to SGE, then executing 'qmod -sj $jobid' after it > starts does not suspend the running job. The 'qstat' command does show > the job as being in the 's' (suspended) state. > > We're not using any custom 'suspend_method' or changing the default > signals sent by SGE. > > Jobs that are suspended (due to subordinated queues) by SGE have never > shown this behavior. > > Any suggestions about how to proceed with troubleshooting? > > Thanks, > > Mark > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
