Dear slurm-user list,

I had cases where our resumeProgram failed due to temporary cloud
timeouts. In that case the resumeProgram returns a value =/= 0. Why does
Slurm still wait until resumeTimeout instead of just accepting the
startup as failed which then should lead to a rescheduling of the job.

Is there some way to achieve the described effect i.e. tell Slurm: "You
can stop waiting, the node won't come alive." or am I missing the
correct way how this should be handled in Slurm?

Best regards,
Xaver


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to