Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

Ole Holm Nielsen Tue, 26 Jan 2021 12:41:55 -0800

Thanks Paul!

On 26-01-2021 21:11, Paul Raines wrote:

You should check your jobs that allocated GPUs and make sure
CUDA_VISIBLE_DEVICES is being set in the environment.  This is a sign
you GPU support is not really there but SLURM is just doing "generic"
resource assignment.

Could you elaborate a bit on this remark? Are you saying that I need tocheck if CUDA_VISIBLE_DEVICES is defined automatically by Slurm insidethe batch job as described in https://slurm.schedmd.com/gres.html?

What do you mean by "your GPU support is not really there" and Slurmdoing "generic" resource assignment? I'm just not understanding this...

With my Slurm 20.02.6 built without NVIDIA libraries, Slurm neverthelessseems to be scheduling multiple jobs so that different jobs are assignedto different GPUs. The GRES=gpu* values point to distinct IDX values(GPU indexes). The nvidia-smi command shows individual processesrunning on distinct GPUs. All seems to be fine - or am I completelymistaken?


Thanks,
Ole

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

Reply via email to