Hi All,

We have a couple nodes with 8 Nvidia Titan X GPUs each. We have some software that can run in parallel across GPUs, but performance is only good if the inter-GPU communication stays on the PCI links of a single CPU socket.

Right now, the only thing I have been able to work reliably [with slurm 14.11.8 on Scientific Linux 6] is to define two types of gpus in the gres.conf:

NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia0 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia1 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia2 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxa File=/dev/nvidia3 CPUs=0-15
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia4 CPUs=16-31
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia5 CPUs=16-31
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia6 CPUs=16-31
NodeName=c-3-29,c-9-9 Name=gpu Type=titanxb File=/dev/nvidia7 CPUs=16-31


The downside is that the user needs to specify one GRES type or the other at job submission. I suppose I could modify the job submit lua script to pick one randomly or on current usage, but that could still lead to imbalanced usage.

I had earlier tried to have a single Type=titanx, with each device restricted to the cores on one socket or the other. I couldn't figure out a way to reliably restrict a single job to cores on a single socket. Also, even with the device restrictions, I was able to get a job with CPU cores on one socket, but using the GPU connected to the other socket.

Is there a recommended way to handle this situation? I'd like to preserve the option of having a single job be able to use all 8 GPUs.

Thanks,
Nate Crawford

--
________________________________________________________________________
Dr. Nathan Crawford              [email protected]
Modeling Facility Director
Department of Chemistry
1102 Natural Sciences II         Office: 2101 Natural Sciences II
University of California, Irvine  Phone: 949-824-4508
Irvine, CA 92697-2025, USA

Reply via email to