After we updated SLURM from 14.11.6 to 14.11.7, we keep getting following message each minute: error: slurm_receive_msg: Zero Bytes were transmitted or received even tough tasks are executed correctly, nodes and slurm utilities work just fine and everything gets written into DB. We've tried to follow advices from SLURM devel list and our own fixes, but nothings worked. We tried: - full server re-start with slurm service and munge - time is identical on nodes and controller - cleaned statesavepath - before update, we didn't have this error, but even rolling back to previous version didn't help - we have same error there two - tried to have multiple ports SlurmctldPort=6810-6817 - set path to SLURM libraries and executed ldconfig - searched the group - right now we have only 1 SLURM version (14.11.7). Old one is completely removed. The only thing we didn't try yet is to re-make the slurm cluster with sacctmgr add cluster ...
What might be causing the error above? One more things, during SLURM 14.11.7 compilation, it not always could find munge libraries, so I had to manually compile them several times.
