Hello Daniel, do /dev/nvidia[0-1] exist on the machines? If not see under http://docs.nvidia.com/cuda/cuda-installation-guide-linux/ there is shell scripted which creates the device nodes for you. They are not always created during startup, especially if there is not X on the system.
kind regards, Christian Am 09.02.2017 um 12:50 schrieb Daniel Ruiz Molina: > > Hi, > > In my GPU cluster, slurmd daemon doesn't start correctly because when > daemon start, it doesn't find /dev/nvidia[0-1] device (mapped in > gres.conf). For solving this problem, I have added attribute > "ExecStartPre=@/usr/bin/nvidia-smi >/dev/null" in service file and now > daemon starts correctly. However, could anybody copy-paste his/her > slurmd daemon file in a GPU cluster? I suppose it must be a better > solution than mine. > > Thanks. -- Dr. Christian Goll HITS gGmbH Schloss-Wolfsbrunnenweg 35 69118 Heidelberg Germany Phone: +49 6221 533 230 Fax: +49 6221 533 230 ________________________________________________ Amtsgericht Mannheim / HRB 337446 Managing Director: Dr. Gesa Schönberger