Re: [slurm-users] GPU Jobs with Slurm

Loris Bennett Thu, 14 Jan 2021 00:18:18 -0800

Hi Abhiram,

Abhiram Chintangal <achintan...@berkeley.edu> writes:


> Hello, 
>
> I recently set up a small cluster at work using Warewulf/Slurm. Currently, I 
> am not able to get the scheduler to 
> work well with GPU's (Gres). 
>
> While slurm is able to filter by GPU type, it allocates all the GPU's on the 
> node. See below:
>
>  [abhiram@whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi 
> --query-gpu=index,name --format=csv
>  index, name
>  0, Tesla P100-PCIE-16GB
>  1, Tesla P100-PCIE-16GB
>  2, Tesla P100-PCIE-16GB
>  3, Tesla P100-PCIE-16GB
>  [abhiram@whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu 
> nvidia-smi --query-gpu=index,name --format=csv
>  index, name
>  0, TITAN RTX
>  1, TITAN RTX
>  2, TITAN RTX
>  3, TITAN RTX
>  4, TITAN RTX
>  5, TITAN RTX
>  6, TITAN RTX
>  7, TITAN RTX
>
> I am fairly new to Slurm and still figuring out my way around it. I would 
> really appreciate any help with this.
>
> For your reference, I attached the slurm.conf and gres.conf files. 

I think this is expected, since nvidia-smi does not actually use the
GPUs, but just returns information on their usage.

A better test would be to run a simple test which really does run on,
say, two GPU and then, while the job is running, log into the GPU node
and run 

  nvidia-smi --query-gpu=index,name,utilization.gpu --format=csv

Cheers,

Loris

-- 
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

Re: [slurm-users] GPU Jobs with Slurm

Reply via email to