Hi all,
*bump*
I can't believe no one has an explanation for this parameter...
Regards,
Uwe
Am 02.09.2014 um 16:30 schrieb Uwe Sauter:
>
> Hi all,
>
> I'm a bit confused by the explanation of the "BatchStartTimeout" option.
> It states:
>
> "Specifies how long to wait after a batch job start request is issued
> before we expect the batch job to be running on the compute node.
> Depending upon how nodes are returned to service, this value may need to
> be increased above its default value of 10 seconds."
>
> It is unclear from which point in time this timeout gets counted. Some
> possibilities:
>
> - when a batch job was submitted
> - when SLURM executes the ResumeProgram command
> - when the node's slurm daemon contacts the controller daemon
>
> Can someone reword the explanation or give details about this option?
>
> Are there recommendations, e.g. linked to ResumeTimeout?
>
>
> Thanks,
>
> Uwe
>