Hola,

I'd like some advice on QOS for clusters.

Currently, we have a broad 90CPU limit (MaxTRESPU across all associations.

I have a special partition on which I want no limits to apply, it has 80
CPUs. So, I have created a special QOS, named after the partition, with a
MaxTRESPU of 180 CPUS.

There is (for argument's sake) one user on that partition.

I have given that partition it's special QOS of no limit (aka 180cpus) in
the slurm.conf
I have given that user's association with that partition, no limit (aka
180cpus)

But I still want the 90CPU limit to apply to his access to other
partitions. And that QOS still exists on those associations.


This is the output:

$ sacctmgr -s list Assoc format=User,Partition,QosLevel

nthio      goode                 goode
nthio      debug                 normal
nthio      prod                    normal


As you can see - they have a different QOS on the special goode partition,
but a normal qos on the others.

According to the docs here
https://slurm.schedmd.com/resource_limits.html

The order QOS is applied is (in this case):

 - partition
 - assoc


So I think everything should work.

eg:

 - said user should be able to: completely utilise the special partition
AND get up to the regular 90 cpus on the normal partition.

This is not what we are seeing.

In practice, we are seeing the user limited by the 90 CPU qos across the
board.

What have I done wrong?

(slurm 16.05.0)

cheers

L.



------
"Mission Statement: To provide hope and inspiration for collective action,
to build collective power, to achieve collective transformation, rooted in
grief and rage but pointed towards vision and dreams."

 - Patrisse Cullors, *Black Lives Matter founder*

Reply via email to