See:
http://slurm.schedmd.com/quickstart_admin.html#upgrade

Quoting Всеволод Никоноров <[email protected]>:

Hello,

I tried to test slurm-14.11 on some of my nodes while other nodes ran slurm-2.5.7, and nodes running 14.11 were not excluded from 2.5.7 controller config. It seems like something confused 2.5.7 controller, for tasks have doubled for some time (each task were visible twice in smap list), and after excluding 14.11 nodes from 2.5.7 controller config those tasks have restarted and doubling has ended.

Can protocol mismatch (which was definitely visible in log) be related to task doubling and hanging? Are there any other safety measures except cross-excluding foreign-version nodes from controllers? I don't want to make our polite users sad again :)

Thanks in advance!


--
Morris "Moe" Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html

Reply via email to