Hola, I'd like some advice on QOS for clusters.
Currently, we have a broad 90CPU limit (MaxTRESPU across all associations. I have a special partition on which I want no limits to apply, it has 80 CPUs. So, I have created a special QOS, named after the partition, with a MaxTRESPU of 180 CPUS. There is (for argument's sake) one user on that partition. I have given that partition it's special QOS of no limit (aka 180cpus) in the slurm.conf I have given that user's association with that partition, no limit (aka 180cpus) But I still want the 90CPU limit to apply to his access to other partitions. And that QOS still exists on those associations. This is the output: $ sacctmgr -s list Assoc format=User,Partition,QosLevel nthio goode goode nthio debug normal nthio prod normal As you can see - they have a different QOS on the special goode partition, but a normal qos on the others. According to the docs here https://slurm.schedmd.com/resource_limits.html The order QOS is applied is (in this case): - partition - assoc So I think everything should work. eg: - said user should be able to: completely utilise the special partition AND get up to the regular 90 cpus on the normal partition. This is not what we are seeing. In practice, we are seeing the user limited by the 90 CPU qos across the board. What have I done wrong? (slurm 16.05.0) cheers L. ------ "Mission Statement: To provide hope and inspiration for collective action, to build collective power, to achieve collective transformation, rooted in grief and rage but pointed towards vision and dreams." - Patrisse Cullors, *Black Lives Matter founder*