A little late but I am running 8.1.7 and suspend worked part-time.

I had to write my own suspend script to make it work, specially with MATLAB jobs which try to trap signals.

Joseph

On 12/19/2014 04:54 AM, [email protected] wrote:

On December 19, 2014 6:19:58 AM EST, Reuti <[email protected]> wrote:
=> Am 18.12.2014 um 22:21 schrieb [email protected]:
=> >
=> > We've got a job that was suspended via:
=> >
=> >      qmod -sj $jobid
=> >
=> > that's continuing to run.  The job consists of a BASH script, which
=> in
=> > turn submits other jobs in a loop, sleeping for 30 seconds after
=> each loop.
=> >
=> > When I examine the job status on the node where it is executing
=> via:
=> >      ps -e f | grep $JOBID
=> >
=> > I see that the process is sleeping (state "S"), which is not
=> unexpected,
=> > given the 'sleep 30' in the loop, but not suspended (state "T"):
=> >
=> >      30559 ?        SNs    0:02  |   \_ /bin/bash
=> /var/tmp/gridengine/8.1.6/default/spool/node-5-2/job_scripts/2367998
=>
=> Maybe it was introduced in this edition, as in 6.2u5 it's working for

I can't believe I left that out... we're running SoGE 8.1.6.

=> me. Do you have a chance to test any other version on another machine
=> with your application in question?

Nope.

Mark

=>
=> -- Reuti
=>
=>
=> > Indeed, the job is not suspended, as it keeps performing the action
=> > inside the loop.
=> >
=> > The problem can be consistently reproduced with a trivial job, such
=> as:
=> >
=> > ------------------------
=> > #! /bin/bash
=> > i=0
=> > while [ $i -le 100 ]
=> > do
=> >      date
=> >      i=$((i + 1))
=> >      sleep 30
=> > done
=> > ------------------------
=> >
=> > Submitting that job to SGE, then executing 'qmod -sj $jobid' after
=> it
=> > starts does not suspend the running job. The 'qstat' command does
=> show
=> > the job as being in the 's' (suspended) state.
=> >
=> > We're not using any custom 'suspend_method' or changing the default
=> > signals sent by SGE.
=> >
=> > Jobs that are suspended (due to subordinated queues) by SGE have
=> never
=> > shown this behavior.
=> >
=> > Any suggestions about how to proceed with troubleshooting?
=> >
=> > Thanks,
=> >
=> > Mark
=> >
=> >
=> > _______________________________________________
=> > users mailing list
=> > [email protected]
=> > https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to