On 3/17/2013 2:14 AM, Reuti wrote:
Am 17.03.2013 um 07:22 schrieb Joseph Farran:

On 1/4/2013 10:37 AM, Reuti wrote:
Am 02.01.2013 um 05:08 schrieb Joseph Farran:

Hello Reuti.

Yes, the job(s) are not suspending (S) as they normally do.   So it's not the 
queue, but the jobs.
But is the queue in suspended state (qstat -f)?
Sorry Reuti, missed your question.

Yes, the queue is SUSPENDED but jobs continue to run:    Here is one example:

free64@compute-14-18.local     BIP   0/4/64         11.21 lx-amd64      S
242709 0.00355 CMAPNN     mengfant     r     03/15/2013 02:27:23     2 20
242709 0.00355 CMAPNN     mengfant     r     03/15/2013 02:27:23     2 33
Were these slave tasks of a parallel job?

No, they are part of a job array:

qstat|fgrep compute-14-18
 242709 0.00610 CMAPNN     mengfant     S     03/15/2013 02:27:23 
free64@compute-14-18.local         2 20
 242709 0.00610 CMAPNN     mengfant     S     03/15/2013 02:27:23 
free64@compute-14-18.local         2 33

I was able to suspend the quue "free64@compute-14-18.local" manually, but this happens 
every so often that Grid Engine "forgets".


-- Reuti


Any idea why it keeps forgetting to suspend?    Only happens once in a while 
but it overloads the nodes when it does happen.



-- Reuti


Normally as soon as 1 or more core jobs enters the node through the queue, the 
subordinate jobs suspend immediately.    Once is a while, the jobs that go in 
through the subordinate queue do not suspend as they should.

On 1/1/2013 7:04 AM, Reuti wrote:
Engine Forgets and does not suspend and the node is overloaded.
The queue is not going into the "S" state or the jobs therein are just not 
suspended?

-- Reuti



_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to