[slurm-dev] Recommendations for scheduling 8-GPU nodes as two sets of four

Nathan Crawford Tue, 08 Sep 2015 17:20:35 -0700


Hi All,

We have a couple nodes with 8 Nvidia Titan X GPUs each. We have some softwarethat can run in parallel across GPUs, but performance is only good if theinter-GPU communication stays on the PCI links of a single CPU socket.

Right now, the only thing I have been able to work reliably [with slurm14.11.8 on Scientific Linux 6] is to define two types of gpus in the gres.conf:


NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia0 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia1 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia2 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia3 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia4 CPUs=16-31
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia5 CPUs=16-31
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia6 CPUs=16-31
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia7 CPUs=16-31

The downside is that the user needs to specify one GRES type or the other atjob submission. I suppose I could modify the job submit lua script to pick onerandomly or on current usage, but that could still lead to imbalanced usage.

I had earlier tried to have a single Type=titanx, with each device restrictedto the cores on one socket or the other. I couldn't figure out a way to reliablyrestrict a single job to cores on a single socket. Also, even with the devicerestrictions, I was able to get a job with CPU cores on one socket, but usingthe GPU connected to the other socket.

Is there a recommended way to handle this situation? I'd like to preserve theoption of having a single job be able to use all 8 GPUs.


Thanks,
Nate Crawford

--
________________________________________________________________________
Dr. Nathan Crawford              [email protected]
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA

[slurm-dev] Recommendations for scheduling 8-GPU nodes as two sets of four

Reply via email to