You should check your jobs that allocated GPUs and make sure
CUDA_VISIBLE_DEVICES is being set in the environment.  This is a sign
you GPU support is not really there but SLURM is just doing "generic"
resource assignment.

I have both GPU and non-GPU nodes. I build SLURM rpms twice. Once on a non-GPU node and use those RPMs to install on the non-GPU nodes. Then build again on the GPU node where CUDA is installed via the NVIDIA CUDA YUM repo rpms so the NVML lib is at /lib64/libnvidia-ml.so.1 (from rpm nvidia-driver-NVML-455.45.01-1.el8.x86_64) and no special mods to the default RPM SPEC is needed. I just run

  rpmbuild --tb slurm-20.11.3.tar.bz2

You can run 'rpm -qlp slurm-20.11.3-1.el8.x86_64.rpm | grep nvml' and see
that /usr/lib64/slurm/gpu_nvml.so only exists on the one built on the
GPU node.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Tue, 26 Jan 2021 2:29pm, Ole Holm Nielsen wrote:

In another thread, On 26-01-2021 17:44, Prentice Bisbal wrote:
 Personally, I think it's good that Slurm RPMs are now available through
 EPEL, although I won't be able to use them, and I'm sure many people on
 the list won't be able to either, since licensing issues prevent them from
 providing support for NVIDIA drivers, so those of us with GPUs on our
 clusters will still have to compile Slurm from source to include NVIDIA
 GPU support.

We're running Slurm 20.02.6 and recently added some NVIDIA GPU nodes.
The Slurm GPU documentation seems to be
https://slurm.schedmd.com/gres.html
We don't seem to have any problems scheduling jobs on GPUs, even though our Slurm RPM build host doesn't have any NVIDIA software installed, as shown by the command:
$ ldconfig -p | grep libnvidia-ml

I'm curious about Prentice's statement about needing NVIDIA libraries to be installed when building Slurm RPMs, and I read the discussion in bug 9525,
https://bugs.schedmd.com/show_bug.cgi?id=9525
from which it seems that the problem was fixed in 20.02.6 and 20.11.

Question: Is there anything special that needs to be done when building Slurm RPMs with NVIDIA GPU support?

Thanks,
Ole




Reply via email to