Hi everyone, I have another newbie question. What’s the best way to prevent Slurm from allocating jobs on nodes with untracked CPU load (e.g. runaway system processes, zombie processes, etc).
We do core-based allocation, which complicates things a bit. But even checking the CPU load for nodes that are supposed to be completely idle (torque-style) would be a good start. Any suggestions would be appreciated.