One thing we changed years ago was to think about things differently. While researchers are in fact buying nodes for the cluster, it's rarely the case that they get any rights to "their" nodes. Instead they are buying CPU time in an equivalent way but it gets averaged over 30 days.

We provide reports, explain how fairshare works and how their particular value was chosen, give them all the ways that we as sysadmins look at the data, and for the most part have accepted that buy-in means CPU hours/month rather than a certain piece of hardware.

Now that journey was not easy. And there is a discussion with every new researcher who wants a piece of the cluster. But it works for us. But there is a big caveat. There needs to be at least a 10% share of the cluster which is owned by a public portion.

This gives some leeway into scheduling and allows others who have not contributed to have access to the same resources albeit at a much lower priority. And that 10% for us has become more like 25% or even more as we have a large and ever growing base of users who have not contributed.

This different way of thinking has made dedicated partitions and QOSes something we have not had to deal with as CPU time per 30 day sliding window has been accepted, can be quantitatively shown, and just is a much easier way to schedule when ALL resources can be used.

Bill

On 10/28/19 11:11 AM, Tina Friedrich wrote:
Hello,

is there a possibility to tie a reservation to a QoS (instead of an
account or user), or enforce a QoS for jobs submitted into a reservation?

The problem I'm trying to solve is - some of our resources are bought on
a co-investment basis. As part of that, the 'owning' group can get very
high scheduling priority (via a QoS) on an equivalent amount of
resource. Additionally, they have a number of reservations for 'their'
nodes they can request per year. However, that lends itself to gaming
the system - they can now submit jobs into the reservation with 'normal'
priority, and then run jobs on the rest of the cluster using the higher
priority - really not the plan.

Basically, I need a way to ensure that - even when a reservation is in
place - those groups 'use up' their priority resources first & then all
other jobs they submit are run with 'lower' priority.

I'm currently dealing with it by modifying the QoS every time a
reservation is created. But that isn't really sustainable on an ongoing
basis - this isn't a one-off for one group, it's part of our operations
model, and there's a (growing) number of them.

One (easy) way I can see is if I had a way to ensure you can not use a
reservation without using the resp. priority QoS - however, from my
reading of the docs there's no way to do that. (As only the one account
has access to the QoS, being able to tie a reservation to a QoS would
sort of solve my problem :) ).

Any ideas? The only thing I can come up involves a lot of scripting, and
it would certainly not be a bit error prone (and not the most flexible).

Tina


Reply via email to