Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-26 Thread Alex Chekholko
Hello all, My error was indeed just the comma in my gres.conf. I was confused because I had the same file on my running nodes but that's just because slurmd started before the erroneous comma was added to the config. So the error message was in fact directly correct, it could not find the

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ryan Novosielski
> On Jul 23, 2018, at 10:31 PM, Ian Mortimer wrote: > > On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote: > >> Best off running nvidia-persistenced. Handles all of this stuff as a >> side effect, and also enables persistence mode, provided you don’t >> configure it otherwise. > >

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ian Mortimer
On Tue, 2018-07-24 at 02:19 +, Ryan Novosielski wrote: > Best off running nvidia-persistenced. Handles all of this stuff as a > side effect, and also enables persistence mode, provided you don’t > configure it otherwise.  Yes. But you have to ensure it starts before slurmd. -- Ian

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Sean Crosby
Hi Alex, What's the actual content of your gres.conf file? Seems to me that you have a trailing comma after the location of the nvidia device Our gres.conf has NodeName=gpuhost[001-077] Name=gpu Type=p100 File=/dev/nvidia0 Cores=0,2,4,6,8,10,12,14,16,18,20,22 NodeName=gpuhost[001-077] Name=gpu

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Ian Mortimer
On Mon, 2018-07-23 at 15:59 -0700, Alex Chekholko wrote: > However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just > fine and produce expected output. They will because running nvidia-smi triggers loading of the kernel module and creation of the device files. But are the device

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Nicholas McCollum
Subject: Re: [slurm-users] "fatal: can't stat gres.conf" Thanks for the suggestion; if my memory serves me right, I had to do that previously to cause the drivers to load correctly after boot. However, in this case both 'nvidia-smi' and 'nvidia-smi -L' run just fine and produce expec

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Alex Chekholko
8 6:10 AM > *To:* Slurm User Community List > *Subject:* Re: [slurm-users] "fatal: can't stat gres.conf" > > Hi all, > > I have a few working GPU compute nodes. I bought a couple of more > identical nodes. They are all diskless; so they all boot from the same > disk

Re: [slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Bill
: [slurm-users] "fatal: can't stat gres.conf" Hi all, I have a few working GPU compute nodes. I bought a couple of more identical nodes. They are all diskless; so they all boot from the same disk image. For some reason slurmd refuses to start on the new nodes; and I'm not able to

[slurm-users] "fatal: can't stat gres.conf"

2018-07-23 Thread Alex Chekholko
Hi all, I have a few working GPU compute nodes. I bought a couple of more identical nodes. They are all diskless; so they all boot from the same disk image. For some reason slurmd refuses to start on the new nodes; and I'm not able to find any differences in hardware or software. Google