AFAIK, if you have this set up correctly, nvidia-smi will be restricted too, though I think we were seeing a bug there at one time in this version.
-- #BlackLivesMatter ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu<mailto:novos...@rutgers.edu> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Jan 14, 2021, at 18:05, Abhiram Chintangal <achintan...@berkeley.edu> wrote: Sean, Thanks for the clarification.I noticed that I am missing the "AllowedDevices" option in mine. After adding this, the GPU allocations started working. (Slurm version 18.08.8) I was also incorrectly using "nvidia-smi" as a check. Regards, Abhiram On Thu, Jan 14, 2021 at 12:22 AM Sean Crosby <scro...@unimelb.edu.au<mailto:scro...@unimelb.edu.au>> wrote: Hi Abhiram, You need to configure cgroup.conf to constrain the devices a job has access to. See https://slurm.schedmd.com/cgroup.conf.html My cgroup.conf is CgroupAutomount=yes AllowedDevicesFile="/usr/local/slurm/etc/cgroup_allowed_devices_file.conf" ConstrainCores=yes ConstrainRAMSpace=yes ConstrainSwapSpace=yes ConstrainDevices=yes TaskAffinity=no CgroupMountpoint=/sys/fs/cgroup The ConstrainDevices=yes is the key to stopping jobs from having access to GPUs they didn't request. Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Thu, 14 Jan 2021 at 18:36, Abhiram Chintangal <achintan...@berkeley.edu<mailto:achintan...@berkeley.edu>> wrote: UoM notice: External email. Be cautious of links, attachments, or impersonation attempts ________________________________ Hello, I recently set up a small cluster at work using Warewulf/Slurm. Currently, I am not able to get the scheduler to work well with GPU's (Gres). While slurm is able to filter by GPU type, it allocates all the GPU's on the node. See below: [abhiram@whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv index, name 0, Tesla P100-PCIE-16GB 1, Tesla P100-PCIE-16GB 2, Tesla P100-PCIE-16GB 3, Tesla P100-PCIE-16GB [abhiram@whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu nvidia-smi --query-gpu=index,name --format=csv index, name 0, TITAN RTX 1, TITAN RTX 2, TITAN RTX 3, TITAN RTX 4, TITAN RTX 5, TITAN RTX 6, TITAN RTX 7, TITAN RTX I am fairly new to Slurm and still figuring out my way around it. I would really appreciate any help with this. For your reference, I attached the slurm.conf and gres.conf files. Best, Abhiram -- Abhiram Chintangal QB3 Nogales Lab Bioinformatics Specialist @ Howard Hughes Medical Institute University of California Berkeley 708D Stanley Hall, Berkeley, CA 94720 Phone (510)666-3344 -- Abhiram Chintangal QB3 Nogales Lab Bioinformatics Specialist @ Howard Hughes Medical Institute University of California Berkeley 708D Stanley Hall, Berkeley, CA 94720 Phone (510)666-3344