Hi!

I have created account on my master node and compute node like given below

groupadd -g 106 slurm
useradd -u 106 -g slurm slurm
mkdir -p /var/log/slurm
chmod 755 /var/log/slurm

I have installed munge on master node and compute node and munge is running


now just I have enter into source code directory of slurm 2.6.5 and

./configure
make
sudo make install

now I have copied  example slurm.conf on my master node at location
/usr/local/etc/slurm.conf
slurmctld and slurmd are available at /usr/local/sbin/
on running
/usr/local/sbin/slurmctld -Dvvv

slurmctld: pidfile not locked, assuming no running daemon
slurmctld: Not running as root. Can't drop supplementary groups
slurmctld: fatal: Failed to set GID to 0

on running /usr/local/sbin/slurmd -Dvvv

slurmd: Node configuration differs from hardware: CPUs=6:6(hw)
Boards=1:1(hw) SocketsPerBoard=6:1(hw) CoresPerSocket=1:6(hw)
ThreadsPerCore=1:1(hw)
slurmd: topology NONE plugin loaded
slurmd: Gathering cpu frequency information for 6 cpus
slurmd: debug:  cpu_freq_init: cpu 0, reset freq: 1600000, reset
governor: userspace
slurmd: debug:  cpu_freq_init: cpu 1, reset freq: 1600000, reset
governor: userspace
slurmd: debug:  cpu_freq_init: cpu 2, reset freq: 1600000, reset
governor: userspace
slurmd: debug:  cpu_freq_init: cpu 3, reset freq: 1600000, reset
governor: userspace
slurmd: debug:  cpu_freq_init: cpu 4, reset freq: 1600000, reset
governor: userspace
slurmd: debug:  cpu_freq_init: cpu 5, reset freq: 1600000, reset
governor: userspace
slurmd: task NONE plugin loaded
slurmd: auth plugin for Munge (http://code.google.com/p/munge/) loaded
slurmd: debug:  spank: opening plugin stack /usr/local/etc/plugstack.conf
slurmd: Munge cryptographic signature plugin loaded
slurmd: Warning: Core limit is only 0 KB
slurmd: slurmd version 2.6.5 started
slurmd: Job accounting gather NOT_INVOKED plugin loaded
slurmd: switch NONE plugin loaded
slurmd: slurmd started on Mon, 10 Feb 2014 11:09:41 +0530
slurmd: CPUs=6 Boards=1 Sockets=6 Cores=1 Threads=1 Memory=24098
TmpDisk=48418 Uptime=14053616
slurmd: AcctGatherEnergy NONE plugin loaded
slurmd: AcctGatherProfile NONE plugin loaded
slurmd: AcctGatherInfiniband NONE plugin loaded
slurmd: AcctGatherFilesystem NONE plugin loaded
slurmd: debug2: No acct_gather.conf file (/usr/local/etc/acct_gather.conf)
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: got shutdown request
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: debug2: _slurm_connect failed: Connection refused
slurmd: debug2: Error connecting slurm stream socket at
172.20.1.102:6817: Connection refused
slurmd: debug:  Failed to contact primary controller: Connection refused
slurmd: error: Unable to register: Unable to contact slurm controller
(connect failure)
slurmd: debug:  Unable to register with slurm controller, retrying
slurmd: all threads complete
slurmd: Munge cryptographic signature plugin unloaded
slurmd: Slurmd shutdown completing


on running    sinfo
slurm_load_partitions: Unable to contact slurm controller (connect failure)

 One more thing I am in confusion for runing slurmd on compute node we
need to install slurm on each compute node ???
 because Its mention only copy configuration files.

On my master node already tourque (pbsnodes) in installed so can slurm
and pbsnodes can exit both.

I am sending my slurm.conf file as attachment .

Please let me know what is the problem I am new to slurm.




Thanks&Regards
Nagendra

Attachment: slurm.conf
Description: Binary data

Reply via email to