Thanks for confirming, Brian. That was my understanding as well. Do you
have it working that way on a machine you have access to?  If so, I'd be
interested to see the config file, because that's not the behavior I am
experiencing in my tests.
In fact, in my tests Slurm will not bring down those "X nodes" but will not
bring them up either, *unless* there is a job targeted to those. I may have
something misconfigured, and I'd love to fix that.

Thanks!

On Wed, Nov 22, 2023 at 5:46 PM Brian Andrus <toomuc...@gmail.com> wrote:

> As I understand it, that setting means "Always have at least X nodes up",
> which includes running jobs. So it stops any wait time for the first X jobs
> being submitted, but any jobs after that will need to wait for the power_up
> sequence.
>
> Brian Andrus
> On 11/22/2023 6:58 AM, Davide DelVento wrote:
>
> I've started playing with powersave and have a question about
> SuspendExcNodes. The documentation at
> https://slurm.schedmd.com/power_save.html says
>
> For example nid[10-20]:4 will prevent 4 usable nodes (i.e IDLE and not
> DOWN, DRAINING or already powered down) in the set nid[10-20] from being
> powered down.
>
> I initially interpreted that as "Slurm will try to keep 4 nodes idle on as
> much as possible", which would have reduced the wait time for new jobs
> targeting those nodes. Instead, it appears to mean "Slurm will not shut off
> the last 4 nodes which are idle in that partition, however it will not turn
> on nodes which it shut off earlier unless jobs are scheduled on them"
>
> Most notably if the 4 idle nodes will be allocated to other jobs (and so
> they are no idle anymore) slurm does not turn on any nodes which have been
> shut off earlier, so it's possible (and depending on workloads perhaps even
> common) to have no idle nodes on regardless of the SuspendExcNode settings.
>
> Is that how it works, or do I have anything else in my setting which is
> causing this unexpected-to-me behavior? I think I can live with it, but
> IMHO it would have been better if slurm attempted to turn on nodes
> preemptively trying to match the requested SuspendExcNodes, rather than
> waiting for job submissions.
>
> Thanks and Happy Thanksgiving to people in the USA
>
>

Reply via email to