[slurm-users] CUDA environment variable not being set

2020-10-08 Thread Sajesh Singh
Slurm 18.08 CentOS 7.7.1908 I have 2 M500 GPUs in a compute node which is defined in the slurm.conf and gres.conf of the cluster, but if I launch a job requesting GPUs the environment variable CUDA_VISIBLE_DEVICES Is never set and I see the following messages in the slurmd.log file: debug: co

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Relu Patrascu
That usually means you don't have the nvidia kernel module loaded, probably because there's no driver installed. Relu On 2020-10-08 14:57, Sajesh Singh wrote: Slurm 18.08 CentOS 7.7.1908 I have 2 M500 GPUs in a compute node which is defined in the slurm.conf and gres.conf of the cluster, b

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Sajesh Singh
: slurm-users On Behalf Of Relu Patrascu Sent: Thursday, October 8, 2020 4:26 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] CUDA environment variable not being set EXTERNAL SENDER That usually means you don't have the nvidia kernel module loaded, probably because there

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Renfro, Michael
User Community List Subject: Re: [slurm-users] CUDA environment variable not being set External Email Warning This email originated from outside the university. Please use caution when opening attachments, clicking links, or responding to requests. It seems as

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Sajesh Singh
History 200 Central Park West New York, NY 10024 (O) (212) 313-7263 (C) (917) 763-9038 (E) ssi...@amnh.org From: slurm-users On Behalf Of Renfro, Michael Sent: Thursday, October 8, 2020 4:53 PM To: Slurm User Community List Subject: Re: [slurm-users] CUDA environment variable not being set

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Brian Andrus
do you have your gres.conf on the nodes also? Brian Andrus On 10/8/2020 11:57 AM, Sajesh Singh wrote: Slurm 18.08 CentOS 7.7.1908 I have 2 M500 GPUs in a compute node which is defined in the slurm.conf and gres.conf of the cluster, but if I launch a job requesting GPUs the environment vari

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Sajesh Singh
Yes. It is located in the /etc/slurm directory -- -SS- From: slurm-users On Behalf Of Brian Andrus Sent: Thursday, October 8, 2020 5:02 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] CUDA environment variable not being set EXTERNAL SENDER do you have your gres.conf on the

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Relu Patrascu
| +-+ -- -SS- *From:* slurm-users *On Behalf Of *Relu Patrascu *Sent:* Thursday, October 8, 2020 4:26 PM *To:* slurm-users@lists.schedmd.com *Subject:* Re: [slurm-users] CUDA environment variable not

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Christopher Samuel
Hi Sajesh, On 10/8/20 11:57 am, Sajesh Singh wrote: debug:  common_gres_set_env: unable to set env vars, no device files configured I suspect the clue is here - what does your gres.conf look like? Does it list the devices in /dev for the GPUs? All the best, Chris -- Chris Samuel : http:/

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Sajesh Singh
] CUDA environment variable not being set EXTERNAL SENDER Hi Sajesh, On 10/8/20 11:57 am, Sajesh Singh wrote: > debug: common_gres_set_env: unable to set env vars, no device files > configured I suspect the clue is here - what does your gres.conf look like? Does it list the devices i

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Christopher Samuel
On 10/8/20 3:48 pm, Sajesh Singh wrote: Thank you. Looks like the fix is indeed the missing file /etc/slurm/cgroup_allowed_devices_file.conf No, you don't want that, that will allow all access to GPUs whether people have requested them or not. What you want is in gres.conf and looks lik

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Sajesh Singh
Christopher, Thank you for the tip. That works as expected. -SS- -Original Message- From: slurm-users On Behalf Of Christopher Samuel Sent: Thursday, October 8, 2020 6:52 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] CUDA environment variable not being set

Re: [slurm-users] CUDA environment variable not being set

2020-10-08 Thread Christopher Samuel
Hi Sajesh, On 10/8/20 4:18 pm, Sajesh Singh wrote: Thank you for the tip. That works as expected. No worries, glad it's useful. Do be aware that the core bindings for the GPUs would likely need to be adjusted for your hardware! Best of luck, Chris -- Chris Samuel : http://www.csamuel