Hi! I have created account on my master node and compute node like given below
groupadd -g 106 slurm useradd -u 106 -g slurm slurm mkdir -p /var/log/slurm chmod 755 /var/log/slurm I have installed munge on master node and compute node and munge is running now just I have enter into source code directory of slurm 2.6.5 and ./configure make sudo make install now I have copied example slurm.conf on my master node at location /usr/local/etc/slurm.conf slurmctld and slurmd are available at /usr/local/sbin/ on running /usr/local/sbin/slurmctld -Dvvv slurmctld: pidfile not locked, assuming no running daemon slurmctld: Not running as root. Can't drop supplementary groups slurmctld: fatal: Failed to set GID to 0 on running /usr/local/sbin/slurmd -Dvvv slurmd: Node configuration differs from hardware: CPUs=6:6(hw) Boards=1:1(hw) SocketsPerBoard=6:1(hw) CoresPerSocket=1:6(hw) ThreadsPerCore=1:1(hw) slurmd: topology NONE plugin loaded slurmd: Gathering cpu frequency information for 6 cpus slurmd: debug: cpu_freq_init: cpu 0, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 1, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 2, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 3, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 4, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 5, reset freq: 1600000, reset governor: userspace slurmd: task NONE plugin loaded slurmd: auth plugin for Munge (http://code.google.com/p/munge/) loaded slurmd: debug: spank: opening plugin stack /usr/local/etc/plugstack.conf slurmd: Munge cryptographic signature plugin loaded slurmd: Warning: Core limit is only 0 KB slurmd: slurmd version 2.6.5 started slurmd: Job accounting gather NOT_INVOKED plugin loaded slurmd: switch NONE plugin loaded slurmd: slurmd started on Mon, 10 Feb 2014 11:09:41 +0530 slurmd: CPUs=6 Boards=1 Sockets=6 Cores=1 Threads=1 Memory=24098 TmpDisk=48418 Uptime=14053616 slurmd: AcctGatherEnergy NONE plugin loaded slurmd: AcctGatherProfile NONE plugin loaded slurmd: AcctGatherInfiniband NONE plugin loaded slurmd: AcctGatherFilesystem NONE plugin loaded slurmd: debug2: No acct_gather.conf file (/usr/local/etc/acct_gather.conf) slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: got shutdown request slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.20.1.102:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: error: Unable to register: Unable to contact slurm controller (connect failure) slurmd: debug: Unable to register with slurm controller, retrying slurmd: all threads complete slurmd: Munge cryptographic signature plugin unloaded slurmd: Slurmd shutdown completing on running sinfo slurm_load_partitions: Unable to contact slurm controller (connect failure) One more thing I am in confusion for runing slurmd on compute node we need to install slurm on each compute node ??? because Its mention only copy configuration files. On my master node already tourque (pbsnodes) in installed so can slurm and pbsnodes can exit both. I am sending my slurm.conf file also Please let me know what is the problem I am new to slurm. Thanks&Regards Nagendra
slurm.conf
Description: Binary data
