[slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

Tim Schneider Tue, 24 Oct 2023 12:41:26 -0700

Hi,

from my understanding, if I run "scontrol reboot <node>", the nodeshould continue to operate as usual and reboots once it is idle. Whenadding the ASAP flag (scontrol reboot ASAP <node>), the node should gointo drain state and not accept any more jobs.

Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME<node>", the node goes in "mix@" state (not drain), but no new jobs getscheduled until the node reboots. Essentially I get draining behavior,even though the node's state is not "drain". Note that this behavior iscaused by "nextstate=RESUME"; if I leave that away, jobs get scheduledas expected. Does anyone have an idea why that could be?


I am running slurm 22.05.9.

Steps to reproduce:

# To prevent node from rebooting immediately
sbatch -t 1:00:00 -c 1 --mem-per-cpu 1G -w <node> ./long_running_script.sh

# Request reboot
scontrol reboot nextstate=RESUME <node>

# Run interactive command, which does not start until "scontrolcancel_reboot <node>" is executed in another shell

srun -t 1:00:00 -c 1 --mem-per-cpu 1G -w <node> --pty bash


Thanks a lot in advance!

Best,

Tim

[slurm-users] scontrol reboot does not allow new jobs to be scheduled if nextstate=RESUME is set

Reply via email to