Yes, you are totally right! It was test every minute from our system of
monitoring. I changed type of check and now the error message disappeared.
Thanks a lot!!

2015-05-28 0:49 GMT+03:00 <[email protected]>:

>
> Hi,
>
> I got the same issue. It was because of a regular TCP test from my Shinken
> server.
> Shinken only perfom a TCP connexion to check if slurmd is up.
>
> Anatoliy Kovalenko <[email protected]> a écrit :
>
>
>  After we updated SLURM from 14.11.6 to 14.11.7, we keep getting following
>> message each minute:
>> error: slurm_receive_msg: Zero Bytes were transmitted or received
>> even tough tasks are executed correctly, nodes and slurm utilities work
>> just fine and everything gets written into DB. We've tried to follow
>> advices from SLURM devel list and our own fixes, but nothings worked. We
>> tried:
>> - full server re-start with slurm service and munge
>> - time is identical on nodes and controller
>> - cleaned statesavepath
>> - before update, we didn't have this error, but even rolling back to
>> previous version didn't help - we have same error there two
>> - tried to have multiple ports  SlurmctldPort=6810-6817
>> - set path to SLURM libraries and executed ldconfig
>> - searched the group
>> - right now we have only 1 SLURM version (14.11.7). Old one is completely
>> removed.
>> The only thing we didn't try yet is to re-make the slurm cluster with
>> sacctmgr add cluster ...
>>
>> What might be causing the error above?
>> One more things, during SLURM 14.11.7 compilation, it not always could
>> find
>> munge libraries, so I had to manually compile them several times.
>>
>>
>

Reply via email to