Re: [slurm-users] only 1 job running

2021-01-27 Thread Chandler
Made a little bit of progress by running sinfo: PARTITION AVAIL TIMELIMIT NODES STATE NODELIST defq*up infinite 3 drain n[011-013] defq*up infinite 1 alloc n010 not sure why n[011-013] are in drain state, that needs to be fixed. After some searching, I ran: s

[slurm-users] only 1 job running

2021-01-27 Thread Chandler
Hi list, we have a new cluster setup with Bright cluster manager. Looking into a support contract there, but trying to get community support in the mean time. I'm sure things were working when the cluster was delivered, but I provisioned an additional node and now the scheduler isn't quite wo

Re: [slurm-users] Building Slurm RPMs with NVIDIA GPU support?

2021-01-27 Thread Tina Friedrich
Yeah, I don't build against NVML either at the moment (it's filed under 'try when you've got some spare time'). I'm pretty much 'autodetecting' what my gres.conf file needs to look like on nodes via my config management, and that all seems to work just fine. CUDA_VISIBLE_DEVIZES and cgroup dev

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-01-27 Thread Brian Andrus
I've definitely been there with the minimum cost issue. One thing I have done personally is start attending SLUG. Now I can give back and learn more in the process. That may be an option to pitch, iterating the value you receive from open source software as part of the ROI. Interestingly, I ha

Re: [slurm-users] Exclude Slurm packages from the EPEL yum repository

2021-01-27 Thread Loris Bennett
Same here - $10k for less than 200 nodes. That's an order of magnitude which makes the finance people ask what we are getting for the money. As we don't have any special requirements which would require customisation, that's not easy to answer, so currently we don't have a support contract. How