I'm not sure if my messages to this list are propagated. It might be a Yahoo mail thing.
I have a small cluster with 10 nodes running CentOS 7.0 and slurm 15.0.8. I start slurmd on all the nodes and so far so good. But, when I try to run a job on all 10 nodes, two of the nodes (node09 and node10) have problems. This is the error message reported by slurmd (in verbose mode) slurmd: error: _step_connect: connect() failed dir /tmp/slurmd node node09 job 123 step 0 No such file or directory /tmp/slurmd is what I set for the SlurmdSpoolDir in slurm.conf. That directory does in fact exist on all nodes. Any help would be appreciated.
