On Fri, Jul 14, 2017 at 08:36:06AM +0000, Simon Andrews wrote: > Can anyone shed any light on an error I'm getting repeated thousands of > times in my grid engine messages log. This happens when I have a job > which is submitted and which is stopped from running by an RQS rule I have > set up. The error I get is: > > > > 07/14/2017 09:27:08|schedu|rocks1|C|not a single host excluded in > rqs_excluded_hosts() > > > > The RQS ruleset I have which triggers this looks like: > Not so much a fix but a possible workaround: Send your logs to syslog (rather than having qmaster log directly into files) and rely on the syslog replacing repeated messages with 'last message repeated <n> times
You could also try tweaking the log_level parameter.
I don't use RQS myself but my best guess is that you have two sorts of hosts.
Regular with a batch queue and the hosts in @interactive with an interactive
queue
Because the hosts {@interactive} clause doesn't further restrict where the limit
applies (because jobs are already limited by being batch or interactive) grid
engine
complains that you appear to have a no-op in yor limit. I think this complaint
by SGE
is spurious.
Possibly:
Give the interactive queue a different name from the regular batch queue. Make
sure the batch
queue can't run on the interactive hosts and vice versa. Then apply the limit
to the queue
rather than the host.
>
>
> {
>
> name per_user_slot_limit
>
> description "limit the number of slots per user"
>
> enabled TRUE
>
> limit users {*} hosts {@interactive} to slots=8
>
> limit users {andrewss} to slots=2
>
> limit users {@bioinf} to slots=616
>
> limit users {*} to slots=411
>
> }
>
>
>
> The rule seems to work, and jobs are held, and then started as expected.
> A job which fails to schedule gets a state like this:
>
>
>
> scheduling info: cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run because it exceeds limit
> "andrewss/////" in rule "per_user_slot_limit/3"
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
> cannot run in queue instance
> "[email protected]" because it is not of type batch
>
>
>
> So it's seeing the rule and is applying it correctly, but the spurious
> errors are causing my messages file to inflate quickly when there are a
> lot of queued jobs.
>
>
>
> Can anyone suggest how to debug or fix this? I can't find anything
> relevant from googling around for the specific error outside of the
> library API it comes from.
>
>
>
> This is using SGE-6.2u5p2-1.x86_64.
>
>
>
> Thanks for any help you can offer!
>
>
>
> Simon.
>
>
>
>
>
> The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT
> Registered Charity No. 1053902.
>
> The information transmitted in this email is directed only to the
> addressee. If you received this in error, please contact the sender and
> delete this email from your system. The contents of this e-mail are the
> views of the sender and do not necessarily represent the views of the
> Babraham Institute. Full conditions at: www.babraham.ac.uk
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
signature.asc
Description: Digital signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
