The trick is having your resume program operation update the ip address of the newly booted node before it returns.
Robert GATAATGCTATTTCTTTAATTTTCGAA > On Jan 14, 2015, at 4:51 AM, Anatoliy Kovalenko <tolik.kovale...@gmail.com> > wrote: > > Our script executes the suggested operation every time when RedsumeProgram is > starting. This is proposed in the 4th step of documentation > http://slurm.schedmd.com/elastic_computing.html > Also, if every time we must do this operation "by hand", then dynamic > allocation of nodes for its jobs by slurm doesn't make much sense. > Is there an alternative way to solve the issue? > > 2015-01-14 0:02 GMT+02:00 <je...@schedmd.com>: >> >> >> Quoting Anatoliy Kovalenko <tolik.kovale...@gmail.com>: >> >>> We are using slurm_elastics ( >>> http://schedmd.com/slurmdocs/elastic_computing.html) and we have got an >>> error error: agent waited too long for nodes to respond, sending batch >>> request anyway..., >>> We noticed that slurmctld remembers IP address of each node, but in our >>> case when a node is shutdown/launched by slurm, its IP may change, even >>> though the list of node names is constant >>> Is the error above associated with different IP addresses? >> >> Possibly >> >>> How can we fix it? >> >> Use "scontrol update NodeName=... NodeAddr=..." to reset the IP address. >> -- >> Morris "Moe" Jette >> CTO, SchedMD LLC >> Commercial Slurm Development and Support >