Thanks, Mike. You were right. I killed the stale process and am now able to run the slurmctld.
Adam / On Mon, Jun 15, 2015 at 11:51 AM, Michael Robbert <mrobb...@mines.edu> wrote: > Adam, > That error looks like you already have a slurmctld running on this host. > (or possibly some other program that is listening on the same TCP port). > > By default slurmctld binds to TCP/6817 and I don’t see a different port > specified in your config file. That is probably fine, don’t change it if > you don’t need to. Try running netstat to see what is currently listening > on that port: > > # netstat -ltpn|grep 6817 > tcp 0 0 0.0.0.0:6817 0.0.0.0:* > LISTEN 11143/slurmctld > > It is likely a stale slurmctld process. If so just kill it and try to > start again. > > Mike > > On Jun 15, 2015, at 9:02 AM, Cooper, Adam <adam_coo...@brown.edu> wrote: > > Hi, > I am new to SLURM and I have been tasked to install it on a cluster of 15 > servers. Right now, I have just installed SLURM on the master, and hope to > get the daemons running and scheduling jobs there before I try to get it > working for the whole cluster. All of the machines are running Ubuntu > 12.04. I have worked through some errors already; however, currently when I > run: > > sudo slurmctld -Dv > > I get this out: > > slurmctld: pidfile not locked, assuming no running daemon > > slurmctld: slurmctld version 14.11.7 started on cluster cluster > > slurmctld: OpenSSL cryptographic signature plugin loaded > > slurmctld: preempt/none loaded > > slurmctld: ExtSensors NONE plugin loaded > > slurmctld: Accounting storage NOT INVOKED plugin loaded > > slurmctld: layouts: no layout to initialize > > slurmctld: topology NONE plugin loaded > > slurmctld: sched: Backfill scheduler plugin loaded > > slurmctld: route default plugin loaded > > slurmctld: layouts: loading entities/relations information > > slurmctld: Recovered state of 1 nodes > > slurmctld: Recovered information about 0 jobs > > slurmctld: Recovered state of 0 reservations > > slurmctld: State of 0 triggers recovered > > slurmctld: read_slurm_conf: backup_controller not specified. > > slurmctld: Running as primary controller > > *slurmctld: error: Error binding slurm stream socket: Address already in > use* > > *slurmctld: fatal: slurm_init_msg_engine_addrname_port error Address > already in use* > > > By the way, I am running the daemon with root because my boss does not > want me to create a separate 'slurm' user. Any idea what might cause this > fatal error? I've attached an rtf of the current slurm configuration file > (I've REDACTED some things to keep private), which I made using the online > configuration tool. > > Please let me know any more relevant information that your need. Thank you > in advance, and sorry for my lack of knowledge; this is very new work for > me. > > > Adam Cooper > > Brown University Computer Engineering '16 > > > > / > <slurm_conf_current.rtf> > > >