From any node you can run scontrol from, what does ‘scontrol show node 
GPUNODENAME | grep -i gres’ return? Mine return lines for both “Gres=” and 

From: slurm-users <> on behalf of Sajesh 
Singh <>
Reply-To: Slurm User Community List <>
Date: Thursday, October 8, 2020 at 3:33 PM
To: Slurm User Community List <>
Subject: Re: [slurm-users] CUDA environment variable not being set

External Email Warning

This email originated from outside the university. Please use caution when 
opening attachments, clicking links, or responding to requests.

It seems as though the modules are loaded as when I run lsmod I get the 

nvidia_drm             43714  0
nvidia_modeset       1109636  1 nvidia_drm
nvidia_uvm            935322  0
nvidia              20390295  2 nvidia_modeset,nvidia_uvm

Also the nvidia-smi command returns the following:

Thu Oct  8 16:31:57 2020
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Quadro M5000        Off  | 00000000:02:00.0 Off |                  Off |
| 33%   21C    P0    45W / 150W |      0MiB /  8126MiB |      0%      Default |
|   1  Quadro M5000        Off  | 00000000:82:00.0 Off |                  Off |
| 30%   17C    P0    45W / 150W |      0MiB /  8126MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |



From: slurm-users <> On Behalf Of Relu 
Sent: Thursday, October 8, 2020 4:26 PM
Subject: Re: [slurm-users] CUDA environment variable not being set


That usually means you don't have the nvidia kernel module loaded, probably 
because there's no driver installed.

On 2020-10-08 14:57, Sajesh Singh wrote:
Slurm 18.08
CentOS 7.7.1908

I have 2 M500 GPUs in a compute node which is defined in the slurm.conf and 
gres.conf of the cluster, but if I launch a job requesting GPUs the environment 
variable CUDA_VISIBLE_DEVICES Is never set and I see the following messages in 
the slurmd.log file:

debug:  common_gres_set_env: unable to set env vars, no device files configured

Has anyone encountered this before?

Thank you,


Reply via email to