Thanks and no worries for the time it took to reply.
Sounds good then, and it's consistent with what the documentation says,
namely "prevent those nodes from being powered down". As you said "keep
that number of nodes up" is a different thing, and yes, it would be nice to
have.
For that purpose,
Sorry for the late reply.
For my site, I used the optional ":" separator to ensure at least 4
nodes were up. Eg: nid[10-20]:4
This means at least 4 nodes.. those nodes do not have to be the same 4
at any time, so if one is down that used to be idle, but 4 are up, that
1 will not be brought
Thanks for confirming, Brian. That was my understanding as well. Do you
have it working that way on a machine you have access to? If so, I'd be
interested to see the config file, because that's not the behavior I am
experiencing in my tests.
In fact, in my tests Slurm will not bring down those "X
As I understand it, that setting means "Always have at least X nodes
up", which includes running jobs. So it stops any wait time for the
first X jobs being submitted, but any jobs after that will need to wait
for the power_up sequence.
Brian Andrus
On 11/22/2023 6:58 AM, Davide DelVento
I've started playing with powersave and have a question about
SuspendExcNodes. The documentation at
https://slurm.schedmd.com/power_save.html says
For example nid[10-20]:4 will prevent 4 usable nodes (i.e IDLE and not
DOWN, DRAINING or already powered down) in the set nid[10-20] from being