Thank you very much, but what I am searching for is not exactly the upgrade procedure, I am rather trying to understand what happened in my enviroment and how to avoid such problems in future. We are testing two installations of slurm on ajacent nodes, so that users who test the new version could have all the network-mounted filesystems (nfs, lustre) from the main installation. It seems that slurmctld 2.5.7 adressed to a node running slurmctld 14.11 and slurmd 14.11 simultaneously, and then some of the nodes controlled by slurmctld 2.5.7 got confused and lost jobs.
Could interaction of a slurmctld 2.5.7 with slurm daemon of different version (which was not supposed to consider it it's master) confuse it so much that it lost the jobs? Will such things happen if I exclude nodes running newer slurm daemons from older slurm's master node's config? Is there anything else I should do for two independent sets of slurm nodes to co-exist without issues? Thank you! 02.09.2014, 07:06, "[email protected]" <[email protected]>: > See: > http://slurm.schedmd.com/quickstart_admin.html#upgrade > > Quoting О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ > <[email protected]>: >> О©╫Hello, >> >> О©╫I tried to test slurm-14.11 on some of my nodes while other nodes >> О©╫ran slurm-2.5.7, and nodes running 14.11 were not excluded from >> О©╫2.5.7 controller config. It seems like something confused 2.5.7 >> О©╫controller, for tasks have doubled for some time (each task were >> О©╫visible twice in smap list), and after excluding 14.11 nodes from >> О©╫2.5.7 controller config those tasks have restarted and doubling has >> О©╫ended. >> >> О©╫Can protocol mismatch (which was definitely visible in log) be >> О©╫related to task doubling and hanging? Are there any other safety >> О©╫measures except cross-excluding foreign-version nodes from >> О©╫controllers? I don't want to make our polite users sad again :) >> >> О©╫Thanks in advance! > > -- > Morris "Moe" Jette > CTO, SchedMD LLC > > Slurm User Group Meeting > September 23-24, Lugano, Switzerland > Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
