[slurm-users] Install & Configuration of slurmdbd

2023-01-30 Thread Jim Klo
Greetings, I’ve been working on updating our small slurm cluster over the last few days. I’ve successfully updated the cluster. However our cluster is missing the slurmdbd configuration, and while I know it’s not required, I would like to add that as it would be helpful to access job history d

[slurm-users] node health check

2023-01-30 Thread Ratnasamy, Fritz
Hi, Currently, some of our nodes are overloaded. The nhc installed used to check the load and drain the node when it is overloaded. However, for the past few days, it is not showing the state of the node. When I run /usr/sbin/nhc manually, it says 20230130 21:25:14 [slurm] /usr/libexec/nhc/node

Re: [slurm-users] Install & Configuration of slurmdbd

2023-01-30 Thread Ole Holm Nielsen
Hi Jim, Maybe you'll find these Wiki pages relevant for setting up your Slurm database: https://wiki.fysik.dtu.dk/Niflheim_system/ https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/ /Ole On 1/30/23 20:43, Jim Klo wrote: I’ve been working on updating our small slurm cluster over the la

Re: [slurm-users] node health check

2023-01-30 Thread Ole Holm Nielsen
20230130 21:25:14 [slurm] /usr/libexec/nhc/node-mark-online mcn26.chicagobooth.edu <http://mcn26.chicagobooth.edu> /usr/libexec/nhc/node-mark-online:  Not sure how to handle node state "" on mcn26.chicagobooth.edu <http://mcn26.chicagobooth.edu> /usr/libexec/nhc/node-mark-online: