William Hay <w....@ucl.ac.uk> writes:

> We have a complex associated with every node called status that is
> normally set to OK.  When a node has a problem we set it to a
> description of the problem instead.   Our JSV ensures jobs always
> request status=OK.  With a similar complex you could request status=OK
> when making the AR.

Yes, I think that's the only solution currently for disabled queues, but
I'd guess it's straightforward to avoid them as an option if someone
would like to try.  We don't currently use AR, so I haven't looked at
it.

> We also have a script that lists out nodes that aren't OK and their
> status.  Essentially duplicating the functionality of pbsnodes under
> Torque.  With this available as a permanent way to disable nodes we've
> set queues to enabled at startup and use qmod -d to mean "disabled
> till next reboot" only.

I tag bad nodes with a comment and put them into a "testing" hostgroup
with access only for admins (via RQS, which will be ignored for AR for a
reason I don't follow).  I think if node user_lists were used instead of
the RQS to restrict access, an AR would exclude the bad nodes for
non-admins, but I'm not sure.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to