I've seen this with time sync issues make sure the nodes are in sync. On Mon, May 2, 2016 at 11:37 AM, Paul Edmon <[email protected]> wrote:
> > I'm seeing quite a few of these errors: > > May 2 11:33:29 holy-slurm01 slurmctld[47253]: error: slurm_receive_msg: > Zero Bytes were transmitted or received > May 2 11:33:29 holy-slurm01 slurmctld[47253]: error: slurm_receive_msg: > Zero Bytes were transmitted or received > > I know that this can be caused by a node or client that is in a bad state, > but I can't figure out how to trace it back to which one. Does anyone have > any tricks for tracing this sort of error back? I turned on the Protocol > Debug Flag but none of the additional debug statements lead to the culprit. > > -Paul Edmon- > -- Russ |¯¯l,[___], l---L--OllllllO¬ ()_) ()_)-----()_)
