Re: [slurm-users] [EXT] GPU Jobs with Slurm

Ryan Novosielski Thu, 14 Jan 2021 15:20:57 -0800

AFAIK, if you have this set up correctly, nvidia-smi will be restricted too, 
though I think we were seeing a bug there at one time in this version.


--
#BlackLivesMatter
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - 
novos...@rutgers.edu<mailto:novos...@rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Jan 14, 2021, at 18:05, Abhiram Chintangal <achintan...@berkeley.edu> wrote:


Sean,

Thanks for the clarification.I noticed that I am missing the "AllowedDevices" 
option in mine. After adding this, the GPU allocations started working. (Slurm 
version 18.08.8)

I was also incorrectly using "nvidia-smi" as a check.

Regards,

Abhiram

On Thu, Jan 14, 2021 at 12:22 AM Sean Crosby 
<scro...@unimelb.edu.au<mailto:scro...@unimelb.edu.au>> wrote:
Hi Abhiram,

You need to configure cgroup.conf to constrain the devices a job has access to. 
See https://slurm.schedmd.com/cgroup.conf.html

My cgroup.conf is

CgroupAutomount=yes
AllowedDevicesFile="/usr/local/slurm/etc/cgroup_allowed_devices_file.conf"

ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
ConstrainDevices=yes

TaskAffinity=no

CgroupMountpoint=/sys/fs/cgroup

The ConstrainDevices=yes is the key to stopping jobs from having access to GPUs 
they didn't request.

Sean

--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia



On Thu, 14 Jan 2021 at 18:36, Abhiram Chintangal 
<achintan...@berkeley.edu<mailto:achintan...@berkeley.edu>> wrote:
UoM notice: External email. Be cautious of links, attachments, or impersonation 
attempts

________________________________
Hello,

I recently set up a small cluster at work using Warewulf/Slurm. Currently, I am 
not able to get the scheduler to
work well with GPU's (Gres).

While slurm is able to filter by GPU type, it allocates all the GPU's on the 
node. See below:

[abhiram@whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi 
--query-gpu=index,name --format=csv
index, name
0, Tesla P100-PCIE-16GB
1, Tesla P100-PCIE-16GB
2, Tesla P100-PCIE-16GB
3, Tesla P100-PCIE-16GB
[abhiram@whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu nvidia-smi 
--query-gpu=index,name --format=csv
index, name
0, TITAN RTX
1, TITAN RTX
2, TITAN RTX
3, TITAN RTX
4, TITAN RTX
5, TITAN RTX
6, TITAN RTX
7, TITAN RTX

I am fairly new to Slurm and still figuring out my way around it. I would 
really appreciate any help with this.

For your reference, I attached the slurm.conf and gres.conf files.

Best,

Abhiram

--

Abhiram Chintangal
QB3 Nogales Lab
Bioinformatics Specialist @ Howard Hughes Medical Institute
University of California Berkeley
708D Stanley Hall, Berkeley, CA 94720
Phone (510)666-3344


--

Abhiram Chintangal
QB3 Nogales Lab
Bioinformatics Specialist @ Howard Hughes Medical Institute
University of California Berkeley
708D Stanley Hall, Berkeley, CA 94720
Phone (510)666-3344

Re: [slurm-users] [EXT] GPU Jobs with Slurm

Reply via email to