Hello,

I tried to test slurm-14.11 on some of my nodes while other nodes ran 
slurm-2.5.7, and nodes running 14.11 were not excluded from 2.5.7 controller 
config. It seems like something confused 2.5.7 controller, for tasks have 
doubled for some time (each task were visible twice in smap list), and after 
excluding 14.11 nodes from 2.5.7 controller config those tasks have restarted 
and doubling has ended.

Can protocol mismatch (which was definitely visible in log) be related to task 
doubling and hanging? Are there any other safety measures except 
cross-excluding foreign-version nodes from controllers? I don't want to make 
our polite users sad again :)

Thanks in advance!

Reply via email to