Re: [slurm-users] SlurmdSpoolDir full

Brian Andrus Sat, 09 Dec 2023 15:43:17 -0800

Xaver,

It is likely your /var or /var/spool mount.

That may be a separate partition or part of your root partition. It isthe partition that is full, not the directory itself. So the cause couldvery well be log files in /var/log. I would check to see what (if any)partitions are getting filled on the node. You can run 'df -h' and seesome info that would get you started.


Brian Andrus

On 12/8/2023 7:00 AM, Xaver Stiensmeier wrote:

Dear slurm-user list,

during a larger cluster run (the same I mentioned earlier 242 nodes), I
got the error "SlurmdSpoolDir full". The SlurmdSpoolDir is apparently a
directory on the workers that is used for job state information
(https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdSpoolDir). However,
I was unable to find more precise information on that dictionary. We
compute all data on another volume so SlurmdSpoolDir has roughly 38 GB
of free space where nothing is intentionally put during the run. This
error only occurred on very few nodes.

I would like to understand what Slurmd is placing in this dir that fills
up the space. Do you have any ideas? Due to the workflow used, we have a
hard time reconstructing the exact scenario that caused this error. I
guess, the "fix" is to just pick a bit larger disk, but I am unsure
whether Slurm behaves normal here.

Best regards
Xaver Stiensmeier

Re: [slurm-users] SlurmdSpoolDir full

Reply via email to