Hi Abhiram, Abhiram Chintangal <achintan...@berkeley.edu> writes:
> Hello, > > I recently set up a small cluster at work using Warewulf/Slurm. Currently, I > am not able to get the scheduler to > work well with GPU's (Gres). > > While slurm is able to filter by GPU type, it allocates all the GPU's on the > node. See below: > > [abhiram@whale ~]$ srun --gres=gpu:p100:2 -n 1 --partition=gpu nvidia-smi > --query-gpu=index,name --format=csv > index, name > 0, Tesla P100-PCIE-16GB > 1, Tesla P100-PCIE-16GB > 2, Tesla P100-PCIE-16GB > 3, Tesla P100-PCIE-16GB > [abhiram@whale ~]$ srun --gres=gpu:titanrtx:2 -n 1 --partition=gpu > nvidia-smi --query-gpu=index,name --format=csv > index, name > 0, TITAN RTX > 1, TITAN RTX > 2, TITAN RTX > 3, TITAN RTX > 4, TITAN RTX > 5, TITAN RTX > 6, TITAN RTX > 7, TITAN RTX > > I am fairly new to Slurm and still figuring out my way around it. I would > really appreciate any help with this. > > For your reference, I attached the slurm.conf and gres.conf files. I think this is expected, since nvidia-smi does not actually use the GPUs, but just returns information on their usage. A better test would be to run a simple test which really does run on, say, two GPU and then, while the job is running, log into the GPU node and run nvidia-smi --query-gpu=index,name,utilization.gpu --format=csv Cheers, Loris -- Dr. Loris Bennett (Hr./Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de