Hi,

from my understanding, if I run "scontrol reboot <node>", the node should continue to operate as usual and reboots once it is idle. When adding the ASAP flag (scontrol reboot ASAP <node>), the node should go into drain state and not accept any more jobs.

Now my issue is that when I run "scontrol reboot ASAP nextstate=RESUME <node>", the node goes in "mix@" state (not drain), but no new jobs get scheduled until the node reboots. Essentially I get draining behavior, even though the node's state is not "drain". Note that this behavior is caused by "nextstate=RESUME"; if I leave that away, jobs get scheduled as expected. Does anyone have an idea why that could be?

I am running slurm 22.05.9.

Steps to reproduce:

# To prevent node from rebooting immediately
sbatch -t 1:00:00 -c 1 --mem-per-cpu 1G -w <node> ./long_running_script.sh

# Request reboot
scontrol reboot nextstate=RESUME <node>

# Run interactive command, which does not start until "scontrol cancel_reboot <node>" is executed in another shell
srun -t 1:00:00 -c 1 --mem-per-cpu 1G -w <node> --pty bash


Thanks a lot in advance!

Best,

Tim


Reply via email to