Hi, > Am 14.04.2015 um 21:32 schrieb John Young <[email protected]>: > > Hello, > > We (fairly) recently upgraded our cluster to Rocks 6.1.1 > and we now seem to be having problems with RQS. On our old > cluster, we had an RQS quota set as follows: > > { > name host-slots > description restrict slots to core count > enabled TRUE > limit hosts {*} to slots=$num_proc > } > > The reason for this was to try to prevent oversubscription > of the processors on the clients. Now, if I have this quota > enabled, jobs that are submitted don't start and if I do a > 'qstat -j job-number' under "scheduling info" I see things like > > cannot run because it exceeds limit "////compute-0-7/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-7/" in rule "host-slots/1" > (-l slots=1) cannot run in queue "compute-0-39.local" because it offers only > hc:slots=0.000000 > cannot run because it exceeds limit "////compute-0-78/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-78/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-55/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-55/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-74/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-74/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-2-7/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-2-1/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-2-2/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-22/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-0-22/" in rule "host-slots/1" > cannot run because it exceeds limit "////compute-1-2/" in rule "host-slots/1" > cannot run in PE "mpich" because it only offers 0 slots > > But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs. > > Has the process for preventing oversubscription changed? Any ideas?
Well, I noticed this too from time to time - it may disappear at one point again. I would judge it a bug in that version of SGE. -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
