Thank you very much, but what I am searching for is not exactly the upgrade 
procedure, I am rather trying to understand what happened in my enviroment and 
how to avoid such problems in future. We are testing two installations of slurm 
on ajacent nodes, so that users who test the new version could have all the 
network-mounted filesystems (nfs, lustre) from the main installation. It seems 
that slurmctld 2.5.7 adressed to a node running slurmctld 14.11 and slurmd 
14.11 simultaneously, and then some of the nodes controlled by slurmctld 2.5.7 
got confused and lost jobs.

Could interaction of a slurmctld 2.5.7 with slurm daemon of different version 
(which was not supposed to consider it it's master) confuse it so much that it 
lost the jobs?

Will such things happen if I exclude nodes running newer slurm daemons from 
older slurm's master node's config? Is there anything else I should do for two 
independent sets of slurm nodes to co-exist without issues?

Thank you!

02.09.2014, 07:06, "[email protected]" <[email protected]>:
> See:
> http://slurm.schedmd.com/quickstart_admin.html#upgrade
>
> Quoting О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ 
> <[email protected]>:
>> О©╫Hello,
>>
>> О©╫I tried to test slurm-14.11 on some of my nodes while other nodes
>> О©╫ran slurm-2.5.7, and nodes running 14.11 were not excluded from
>> О©╫2.5.7 controller config. It seems like something confused 2.5.7
>> О©╫controller, for tasks have doubled for some time (each task were
>> О©╫visible twice in smap list), and after excluding 14.11 nodes from
>> О©╫2.5.7 controller config those tasks have restarted and doubling has
>> О©╫ended.
>>
>> О©╫Can protocol mismatch (which was definitely visible in log) be
>> О©╫related to task doubling and hanging? Are there any other safety
>> О©╫measures except cross-excluding foreign-version nodes from
>> О©╫controllers? I don't want to make our polite users sad again :)
>>
>> О©╫Thanks in advance!
>
> --
> Morris "Moe" Jette
> CTO, SchedMD LLC
>
> Slurm User Group Meeting
> September 23-24, Lugano, Switzerland
> Find out more http://slurm.schedmd.com/slurm_ug_agenda.html

Reply via email to