Probably your best bet is to use QoS's to accomplish this.  Be advised that suspending jobs still leaves them in memory space.

-Paul Edmon-

On 9/18/19 9:16 PM, Benjamin Wong wrote:

I plan to purchase a GPU machine with 8 GPUs which will be shared between group A and group B.  Group A is an existing group with SLURM nodes.  Group B has no SLURM nodes but will have access to half of the resources on one SLURM node.  I'm trying to figure out how to get SLURM to implement the policies I want below:

  * If both groups are using the machine evenly, then I want the
    resources to be split evenly.
  * If only group A is using the resources, then they will consume all
    the resources and vice versa.
  * If group A is using all resources but group B begins requesting
    resources, then group A will suspend half of its work for group B
    to use resources.  Vice versa applies.

What's the best way to implement this?  Should I have two halves of a machine in two different partitions?

Looking forward to hints,
Ben Wong

Reply via email to