When I run an sbatch script with the line

#SBATCH --gres=gpu:gp100:1

it runs.  When I change it to

#SBATCH --gres=gpu:gp100:3

it fails with "Requested node configuration is not available".  But I have
a node with 4 gp100s available.  Here's my slurm.conf:

NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2
CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770 Gres=gpu:gp100:4

That node has a gres.conf with these lines:

Name=gpu Type=gp100  File=/dev/nvidia0
Name=gpu Type=gp100  File=/dev/nvidia1
Name=gpu Type=gp100  File=/dev/nvidia2
Name=gpu Type=gp100  File=/dev/nvidia3

The character devices all exist in /dev.

What's the controller complaining about?

Reply via email to