Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-07 Thread Stefan Staeglich
Hi Xaver, we also had a similar problem with Slurm 21.08 (see thread "error: power_save module disabled, NULL SuspendProgram"). Fortunately, we have not yet observed this since the upgrade to 23.02. But the time period (about a month) is still too short to know if the problem is really fixed

Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier
Hi Ole, for multiple reasons we build it ourself, but I am not really involved in that process, but I will contact the person who is. Thanks for the recommendation! We should probably implement a regular check whether there is a new slurm version. I am not 100% whether this will fix our issues

Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Ole Holm Nielsen
On 12/6/23 11:51, Xaver Stiensmeier wrote: Good idea. Here's our current version: ``` sinfo -V slurm 22.05.7 ``` Quick googling told me that the latest version is 23.11. Does the upgrade change anything in that regard? I will keep reading. There are nice bug fixes in 23.02 mentioned in my

Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier
Hi Ole, Good idea. Here's our current version: ``` sinfo -V slurm 22.05.7 ``` Quick googling told me that the latest version is 23.11. Does the upgrade change anything in that regard? I will keep reading. Xaver On 06.12.23 11:09, Ole Holm Nielsen wrote: Hi Xaver, Your version of Slurm may

Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Ole Holm Nielsen
Hi Xaver, Your version of Slurm may matter for your power saving experience. Do you run an updated version? /Ole On 12/6/23 10:54, Xaver Stiensmeier wrote: Hi Ole, I will double check, but I am very sure that giving a reason is possible as it has been done at least 20 other times without

Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier
Hi Ole, I will double check, but I am very sure that giving a reason is possible as it has been done at least 20 other times without error during that exact run. It might be ignored though. You can also give a reason when defining the states POWER_UP and POWER_DOWN. Slurm's documentation is not

Re: [slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Ole Holm Nielsen
Hi Xavier, On 12/6/23 09:28, Xaver Stiensmeier wrote: using https://slurm.schedmd.com/power_save.html we had one case out of many (>242) node starts that resulted in |slurm_update error: Invalid node state specified| when we called: |scontrol update NodeName="$1" state=RESUME

[slurm-users] Power Save: When is RESUME an invalid node state?

2023-12-06 Thread Xaver Stiensmeier
Dear Slurm User list, using https://slurm.schedmd.com/power_save.html we had one case out of many (>242) node starts that resulted in |slurm_update error: Invalid node state specified| when we called: |scontrol update NodeName="$1" state=RESUME reason=FailedStartup| in the Fail script. We