Hi Luke

Thanks for the head up

From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Luke 
Yeager
Sent: Wednesday, 24 March 2021 4:58 AM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: Re: [slurm-users] Slurm - UnkillableStepProgram

While you're looking at this, make sure you don't set UnkillableStepTimeout to 
a value larger than 126 seconds:
https://bugs.schedmd.com/show_bug.cgi?id=11103

From: slurm-users 
<slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>>
 On Behalf Of Yap, Mike
Sent: Monday, March 22, 2021 7:13 PM
To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>
Subject: [slurm-users] Slurm - UnkillableStepProgram

External email: Use caution opening links or attachments


Hi All

Have been reading on the archive hoping to implement unkillablesteptimeout and 
unkillablesteprogram to the slurm
But I'm kind of confuse with it application


  1.  I presume UnkillableStepTimeout is set in slurm.conf. and it act as a 
timer to trigger UnkillableStepProgram
  2.  UnkillableStepProgram   can be use to send email or reboot compute node - 
question is how do we configure it ?


scontrol show config | grep -i kill
KillOnBadExit           = 1
KillWait                = 30 sec
UnkillableStepProgram   = (null)
UnkillableStepTimeout   = 300 sec

Please advise

Thanks
Mike

Reply via email to