What is "$num_proc"? Did you try to set a real number? Like " limit
hosts {*} to slots=12"?
On Tue, Apr 14, 2015 at 3:32 PM, John Young <[email protected]> wrote:
> Hello,
>
> We (fairly) recently upgraded our cluster to Rocks 6.1.1
> and we now seem to be having problems with RQS. On our old
> cluster, we had an RQS quota set as follows:
>
> {
> name host-slots
> description restrict slots to core count
> enabled TRUE
> limit hosts {*} to slots=$num_proc
> }
>
> The reason for this was to try to prevent oversubscription
> of the processors on the clients. Now, if I have this quota
> enabled, jobs that are submitted don't start and if I do a
> 'qstat -j job-number' under "scheduling info" I see things like
>
> cannot run because it exceeds limit "////compute-0-7/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-7/" in rule "host-slots/1"
> (-l slots=1) cannot run in queue "compute-0-39.local" because it offers only
> hc:slots=0.000000
> cannot run because it exceeds limit "////compute-0-78/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-78/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-55/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-55/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-74/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-74/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-2-7/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-2-1/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-2-2/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-22/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-0-22/" in rule "host-slots/1"
> cannot run because it exceeds limit "////compute-1-2/" in rule "host-slots/1"
> cannot run in PE "mpich" because it only offers 0 slots
>
> But as soon as I run 'qconf -mrqs' and change TRUE to FALSE, the job runs.
>
> Has the process for preventing oversubscription changed? Any ideas?
>
> JY
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
--
Best,
Feng
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users