Ok,
I found one clue. "qstat" and "qstat -f" are reporting different number of
cores ( slots ) in use:
Qstat is reporting 25 + 32 + 32 cores while "qstat -f " reports 25 + 15 + 10
cores:
qstat -f ( for compute-2-6 )
[email protected] BIP 0/50/64 3.79 lx-amd64
45647 0.54310 QRLOGIN user1 r 11/07/2012 15:55:04 25
40044 0.55421 SNPtable user2 r 11/06/2012 11:13:18 15
40279 0.55421 SNPtable user2 r 11/06/2012 14:50:25 10
$ qstat | grep compute-2-6
45647 0.54310 QRLOGIN user1 r 11/07/2012 15:55:04
[email protected] 25
40044 0.55421 SNPtable user2 r 11/06/2012 11:13:18
[email protected] 32
40279 0.55421 SNPtable user2 r 11/06/2012 14:50:25
[email protected] 32
So it looks like SGE is confused. How can I fix this?
On 11/7/2012 9:25 PM, Joseph Farran wrote:
Hi.
I am using SGE 8.1.2 with several queues and recently, several of my 64-slots
queues are not scheduling the full 64-cores.
So if I submit 64 1-core jobs, only 57 or so are schedule per node instead of
64. If I submit 4 16-core pe jobs, only 3 of the 16-core pe jobs are
scheduled on a node instead of 4 ( 16x4 = 64 ).
This was working before just fine, so I think SGE just lost track or something. I tried restarting SGE with same symptoms. My queues do show "slots=64". The compute nodes do not have any
special settings.
Is there a way to tell SGE to re-count cores per node, or to reset SGE without
disrupting running jobs?
Joseph
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users