Hi Davide,

On 10/5/23 15:28, Davide DelVento wrote:
    IMHO, "pretending" to power down nodes defies the logic of the Slurm
power_save plugin.
And it is sure useless ;)
But I was using the suggestion from https://slurm.schedmd.com/power_save.html <https://slurm.schedmd.com/power_save.html> which says

You can also configure Slurm with programs that perform no action as *SuspendProgram* and *ResumeProgram* to assess the potential impact of power saving mode before enabling it.

I had not noticed the above sentence in the power_save manual before! So I decided to test a "no action" power saving script, similar to what you have done, applying it to a test partition. I conclude that "no action" power saving DOES NOT WORK, at least in Slurm 23.02.5. So I opened a bug report https://bugs.schedmd.com/show_bug.cgi?id=17848 to find out if the documentation is obsolete, or if there may be a bug. Please follow that bug to find out the answer from SchedMD.

What I *believe* (but not with 100% certainty) really happens with power saving in the current Slurm versions is what I wrote yesterday:

    Slurmctld expects suspended nodes to *really* power
    down (slurmd is stopped).  When slurmctld resumes a suspended node, it
    expects slurmd to start up when the node is powered on.  There is a
    ResumeTimeout parameter which I've set to about 15-30 minutes in case of
    delays due to BIOS updates and the like - the default of 60 seconds is
    WAY too small!

I hope this helps,
Ole

Reply via email to