[slurm-users] Troubleshooting job stuck in Pending state

2023-12-08 Thread Pacey, Mike
Hi folks, I'm looking for some advice on how to troubleshoot jobs we occasionally see on our cluster that are stuck in a pending state despite sufficient matching resources being free. In the case I'm trying to troubleshoot the Reason field lists (Priority) but to find any way to get the

Re: [slurm-users] SlurmdSpoolDir full

2023-12-08 Thread Ole Holm Nielsen
Hi Xaver, On 12/8/23 16:00, Xaver Stiensmeier wrote: during a larger cluster run (the same I mentioned earlier 242 nodes), I got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a directory on the workers that is used for job state information

[slurm-users] SlurmdSpoolDir full

2023-12-08 Thread Xaver Stiensmeier
Dear slurm-user list, during a larger cluster run (the same I mentioned earlier 242 nodes), I got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a directory on the workers that is used for job state information (https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir).