Probably your best bet is to use QoS's to accomplish this. Be advised
that suspending jobs still leaves them in memory space.
-Paul Edmon-
On 9/18/19 9:16 PM, Benjamin Wong wrote:
Hello,
I plan to purchase a GPU machine with 8 GPUs which will be shared
between group A and group B. Group A is an existing group with SLURM
nodes. Group B has no SLURM nodes but will have access to half of the
resources on one SLURM node. I'm trying to figure out how to get
SLURM to implement the policies I want below:
* If both groups are using the machine evenly, then I want the
resources to be split evenly.
* If only group A is using the resources, then they will consume all
the resources and vice versa.
* If group A is using all resources but group B begins requesting
resources, then group A will suspend half of its work for group B
to use resources. Vice versa applies.
What's the best way to implement this? Should I have two halves of a
machine in two different partitions?
Looking forward to hints,
Ben Wong