Yeah, I don't build against NVML either at the moment (it's filed under
'try when you've got some spare time'). I'm pretty much 'autodetecting'
what my gres.conf file needs to look like on nodes via my config
management, and that all seems to work just fine.
CUDA_VISIBLE_DEVIZES and cgroup device restrictions works, as well.
Tina
On 26/01/2021 21:25, Christopher Samuel wrote:
On 1/26/21 12:10 pm, Ole Holm Nielsen wrote:
What I don't understand is, is it actually *required* to make the
NVIDIA libraries available to Slurm? I didn't do that, and I'm not
aware of any problems with our GPU nodes so far. Of course, our GPU
nodes have the libraries installed and the /dev/nvidia? devices are
present.
You only need it if you want to use NVML autodetection of GPUs, we don't
have any nvidia software in the OS image we use to build our vast array
of RPMs and they work just fine on our GPU nodes.
All the best,
Chris
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk