Yes, you are totally right! It was test every minute from our system of monitoring. I changed type of check and now the error message disappeared. Thanks a lot!!
2015-05-28 0:49 GMT+03:00 <[email protected]>: > > Hi, > > I got the same issue. It was because of a regular TCP test from my Shinken > server. > Shinken only perfom a TCP connexion to check if slurmd is up. > > Anatoliy Kovalenko <[email protected]> a écrit : > > > After we updated SLURM from 14.11.6 to 14.11.7, we keep getting following >> message each minute: >> error: slurm_receive_msg: Zero Bytes were transmitted or received >> even tough tasks are executed correctly, nodes and slurm utilities work >> just fine and everything gets written into DB. We've tried to follow >> advices from SLURM devel list and our own fixes, but nothings worked. We >> tried: >> - full server re-start with slurm service and munge >> - time is identical on nodes and controller >> - cleaned statesavepath >> - before update, we didn't have this error, but even rolling back to >> previous version didn't help - we have same error there two >> - tried to have multiple ports SlurmctldPort=6810-6817 >> - set path to SLURM libraries and executed ldconfig >> - searched the group >> - right now we have only 1 SLURM version (14.11.7). Old one is completely >> removed. >> The only thing we didn't try yet is to re-make the slurm cluster with >> sacctmgr add cluster ... >> >> What might be causing the error above? >> One more things, during SLURM 14.11.7 compilation, it not always could >> find >> munge libraries, so I had to manually compile them several times. >> >> >
