The trick is having your resume program operation update the ip address of the 
newly booted node before it returns.  

Robert

GATAATGCTATTTCTTTAATTTTCGAA

> On Jan 14, 2015, at 4:51 AM, Anatoliy Kovalenko <tolik.kovale...@gmail.com> 
> wrote:
> 
> Our script executes the suggested operation every time when RedsumeProgram is 
> starting. This is proposed in the 4th step of documentation 
> http://slurm.schedmd.com/elastic_computing.html
> Also, if every time we must do this operation "by hand", then dynamic 
> allocation of nodes for its jobs by slurm doesn't make much sense.
> Is there an alternative way to solve the issue?
> 
> 2015-01-14 0:02 GMT+02:00 <je...@schedmd.com>:
>> 
>> 
>> Quoting Anatoliy Kovalenko <tolik.kovale...@gmail.com>:
>> 
>>> We are using slurm_elastics (
>>> http://schedmd.com/slurmdocs/elastic_computing.html) and we have got an
>>> error  error: agent waited too long for nodes to respond, sending batch
>>> request anyway...,
>>> We noticed that slurmctld remembers IP address of each node, but in our
>>> case when a node is shutdown/launched by slurm, its IP may change, even
>>> though the list of node names is constant
>>> Is the error above associated with different IP addresses?
>> 
>> Possibly
>> 
>>> How can we fix it?
>> 
>> Use "scontrol update NodeName=... NodeAddr=..." to reset the IP address.
>> -- 
>> Morris "Moe" Jette
>> CTO, SchedMD LLC
>> Commercial Slurm Development and Support
> 

Reply via email to