[slurm-dev] Re: Why my slurm is running on only one node?

허웅 Thu, 27 Jul 2017 18:31:28 -0700

Each host have 8 CPUs and I have tried to test higher loads using stress-ng.
 
Here is a result of command:
 
[root@GO1]/lustre# srun -n 10 test.sh
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28287] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28299] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28311] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28323] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28335] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28347] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28359] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28373] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
 10:27:27 up 55 days, 14:12,  2 users,  load average: 6.65, 4.90, 4.55
stress: info: [28384] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
stress: info: [28385] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd
stress: info: [28373] successful run completed in 11s
 10:27:40 up 55 days, 14:13,  2 users,  load average: 19.85, 7.97, 5.57
GO1
stress: info: [28323] successful run completed in 14s
stress: info: [28385] successful run completed in 14s
stress: info: [28287] successful run completed in 14s
stress: info: [28311] successful run completed in 14s
stress: info: [28299] successful run completed in 14s
stress: info: [28384] successful run completed in 14s
stress: info: [28347] successful run completed in 14s
stress: info: [28359] successful run completed in 14s
stress: info: [28335] successful run completed in 14s
 10:27:45 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:45 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:45 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:45 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:45 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:46 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:46 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:47 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1
 10:27:47 up 55 days, 14:13,  2 users,  load average: 19.94, 8.19, 5.65
GO1


"test.sh" is looking like:
#!/bin/bash
uptime
stress -c 8 -i 1 -m 1 -t 10s
uptime
hostname

I'm very confused...
-----Original Message-----
From: "Lachlan Musicman"&lt;data...@gmail.com&gt; 
To: "slurm-dev"&lt;slurm-dev@schedmd.com&gt;; 
Cc: 
Sent: 2017-07-28 (금) 09:48:14
Subject: [slurm-dev] Re: Why my slurm is running on only one node?
 
 
 


 
 
I think it's because hostname is so undemanding.
 How many CPUs does each host have?
 You may need to use ((number of cpus per host) + 1) to see action on another 
node. 
 You can try using stress-ng to test higher loads?

https://www.cyberciti.biz/faq/stress-test-linux-unix-server-with-stress-ng/
 cheers L.  ------
"The antidote to apocalypticism is 
apocalyptic civics. Apocalyptic civics is the 
insistence that we cannot ignore the truth, nor should we panic about 
it. It is a shared consciousness that our institutions have failed and 
our ecosystem is collapsing, yet we are still here — and we are creative
 agents who can shape our destinies. Apocalyptic civics is the 
conviction that the only way out is through, and the only way through is
 together. "

Greg Bloom @greggish https://twitter.com/greggish/status/873177525903609857 

On 28 July 2017 at 10:28, 허웅 &lt;hoewoongg...@naver.com&gt; wrote:
 
 


 
 
I have 5 nodes include control node.
 
and my nodes are looking like this
 
Control Node : GO1
Compute Nodes : GO[1-5]
 
when i trying to allocate some job to multiple nodes, only one node works. 
 
example]
 
$ srun -N5 hostname 
GO1
GO1
GO1
GO1
GO1
 
even I expected like this
 
$ srun -N5 hostname
GO1
GO2
GO3
GO4
GO5
 
What should i do?
 
there are some my configures.
 
$ scontrol show frontend
FrontendName=GO1 State=IDLE Version=17.02 Reason=(null)
   BootTime=2017-06-02T20:14:39 SlurmdStartTime=2017-07-27T16:29:46
   
FrontendName=GO2 State=IDLE Version=17.02 Reason=(null)
   BootTime=2017-07-05T17:54:13 SlurmdStartTime=2017-07-27T16:30:07
   
FrontendName=GO3 State=IDLE Version=17.02 Reason=(null)
   BootTime=2017-07-05T17:22:58 SlurmdStartTime=2017-07-27T16:30:08
   
FrontendName=GO4 State=IDLE Version=17.02 Reason=(null)
   BootTime=2017-07-05T17:21:40 SlurmdStartTime=2017-07-27T16:30:08
   
FrontendName=GO5 State=IDLE Version=17.02 Reason=(null)
   BootTime=2017-07-05T17:21:39 SlurmdStartTime=2017-07-27T16:30:09
 
$ scontrol ping
Slurmctld(primary/backup) at GO1/(NULL) are UP/DOWN
 
[slurm.conf]
# slurm.conf
#
# See the slurm.conf man page for more information.
#
ClusterName=linux
ControlMachine=GO1
ControlAddr=192.168.30.74
#
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/lib/slurmd
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmd/slurmctld.pid
SlurmdPidFile=/var/run/slurmd/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0
TreeWidth=50
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
SchedulerType=sched/backfill
FastSchedule=1
#
# LOGGING
SlurmctldDebug=7
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=7
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
#
# COMPUTE NODES
NodeName=sgo[1-5] NodeHostName=GO[1-5] #NodeAddr=192.168.30.[74,141,68,70,72] 
#
# PARTITIONS
PartitionName=party Default=yes Nodes=ALL

[slurm-dev] Re: Why my slurm is running on only one node?

Reply via email to