Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-20 Thread Chris Samuel
On 20/1/23 3:51 am, Stefan Staeglich wrote: But someone who is actually using a UnkillableStepProgram stated the opposite (that it's executed on the controller nodes). Are you aware of any change between Slurm releases? Maybe one of the two parts is just a leftover. Are you using a UnkillableSte

Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-20 Thread Stefan Staeglich
Hi Chris, thank you. I've overseen this part. But someone who is actually using a UnkillableStepProgram stated the opposite (that it's executed on the controller nodes). Are you aware of any change between Slurm releases? Maybe one of the two parts is just a leftover. Are you using a Unkillabl

Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Christopher Samuel
On 1/19/23 5:01 am, Stefan Staeglich wrote: Hi, Hiya, I'm wondering where the UnkillableStepProgram is actually executed. According to Mike it has to be available on every on the compute nodes. This makes sense only if it is executed there. That's right, it's only executed on compute nodes

Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Stefan Staeglich
Hi, I'm wondering where the UnkillableStepProgram is actually executed. According to Mike it has to be available on every on the compute nodes. This makes sense only if it is executed there. But the man page slurm.conf of 21.08.x states: UnkillableStepProgram Must be execut

Re: [slurm-users] Slurm - UnkillableStepProgram

2021-03-23 Thread Yap, Mike
Hi Luke Thanks for the head up From: slurm-users On Behalf Of Luke Yeager Sent: Wednesday, 24 March 2021 4:58 AM To: Slurm User Community List Subject: Re: [slurm-users] Slurm - UnkillableStepProgram While you're looking at this, make sure you don't set UnkillableStepTimeout t

Re: [slurm-users] Slurm - UnkillableStepProgram

2021-03-23 Thread Yap, Mike
Hi Chris Thanks for the clarification Mike -Original Message- From: slurm-users On Behalf Of Chris Samuel Sent: Tuesday, 23 March 2021 5:30 PM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Slurm - UnkillableStepProgram Hi Mike, On 22/3/21 7:12 pm, Yap, Mike wrote

Re: [slurm-users] Slurm - UnkillableStepProgram

2021-03-23 Thread Luke Yeager
While you're looking at this, make sure you don't set UnkillableStepTimeout to a value larger than 126 seconds: https://bugs.schedmd.com/show_bug.cgi?id=11103 From: slurm-users On Behalf Of Yap, Mike Sent: Monday, March 22, 2021 7:13 PM To: slurm-users@lists.schedmd.com Subject: [s

Re: [slurm-users] Slurm - UnkillableStepProgram

2021-03-22 Thread Chris Samuel
Hi Mike, On 22/3/21 7:12 pm, Yap, Mike wrote: # I presume UnkillableStepTimeout is set in slurm.conf. and it act as a timer to trigger UnkillableStepProgram That is correct. # UnkillableStepProgram   can be use to send email or reboot compute node – question is how do we configure it ? Al

[slurm-users] Slurm - UnkillableStepProgram

2021-03-22 Thread Yap, Mike
Hi All Have been reading on the archive hoping to implement unkillablesteptimeout and unkillablesteprogram to the slurm But I'm kind of confuse with it application 1. I presume UnkillableStepTimeout is set in slurm.conf. and it act as a timer to trigger UnkillableStepProgram 2. Unkillab