Each host have 8 CPUs and I have tried to test higher loads using stress-ng. Here is a result of command: [root@GO1]/lustre# srun -n 10 test.sh 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28287] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28299] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28311] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28323] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28335] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28347] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28359] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28373] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd 10:27:27 up 55 days, 14:12, 2 users, load average: 6.65, 4.90, 4.55 stress: info: [28384] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd stress: info: [28385] dispatching hogs: 8 cpu, 1 io, 1 vm, 0 hdd stress: info: [28373] successful run completed in 11s 10:27:40 up 55 days, 14:13, 2 users, load average: 19.85, 7.97, 5.57 GO1 stress: info: [28323] successful run completed in 14s stress: info: [28385] successful run completed in 14s stress: info: [28287] successful run completed in 14s stress: info: [28311] successful run completed in 14s stress: info: [28299] successful run completed in 14s stress: info: [28384] successful run completed in 14s stress: info: [28347] successful run completed in 14s stress: info: [28359] successful run completed in 14s stress: info: [28335] successful run completed in 14s 10:27:45 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:45 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:45 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:45 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:45 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:46 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:46 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:47 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1 10:27:47 up 55 days, 14:13, 2 users, load average: 19.94, 8.19, 5.65 GO1
"test.sh" is looking like: #!/bin/bash uptime stress -c 8 -i 1 -m 1 -t 10s uptime hostname I'm very confused... -----Original Message----- From: "Lachlan Musicman"<data...@gmail.com> To: "slurm-dev"<slurm-dev@schedmd.com>; Cc: Sent: 2017-07-28 (금) 09:48:14 Subject: [slurm-dev] Re: Why my slurm is running on only one node? I think it's because hostname is so undemanding. How many CPUs does each host have? You may need to use ((number of cpus per host) + 1) to see action on another node. You can try using stress-ng to test higher loads? https://www.cyberciti.biz/faq/stress-test-linux-unix-server-with-stress-ng/ cheers L. ------ "The antidote to apocalypticism is apocalyptic civics. Apocalyptic civics is the insistence that we cannot ignore the truth, nor should we panic about it. It is a shared consciousness that our institutions have failed and our ecosystem is collapsing, yet we are still here — and we are creative agents who can shape our destinies. Apocalyptic civics is the conviction that the only way out is through, and the only way through is together. " Greg Bloom @greggish https://twitter.com/greggish/status/873177525903609857 On 28 July 2017 at 10:28, 허웅 <hoewoongg...@naver.com> wrote: I have 5 nodes include control node. and my nodes are looking like this Control Node : GO1 Compute Nodes : GO[1-5] when i trying to allocate some job to multiple nodes, only one node works. example] $ srun -N5 hostname GO1 GO1 GO1 GO1 GO1 even I expected like this $ srun -N5 hostname GO1 GO2 GO3 GO4 GO5 What should i do? there are some my configures. $ scontrol show frontend FrontendName=GO1 State=IDLE Version=17.02 Reason=(null) BootTime=2017-06-02T20:14:39 SlurmdStartTime=2017-07-27T16:29:46 FrontendName=GO2 State=IDLE Version=17.02 Reason=(null) BootTime=2017-07-05T17:54:13 SlurmdStartTime=2017-07-27T16:30:07 FrontendName=GO3 State=IDLE Version=17.02 Reason=(null) BootTime=2017-07-05T17:22:58 SlurmdStartTime=2017-07-27T16:30:08 FrontendName=GO4 State=IDLE Version=17.02 Reason=(null) BootTime=2017-07-05T17:21:40 SlurmdStartTime=2017-07-27T16:30:08 FrontendName=GO5 State=IDLE Version=17.02 Reason=(null) BootTime=2017-07-05T17:21:39 SlurmdStartTime=2017-07-27T16:30:09 $ scontrol ping Slurmctld(primary/backup) at GO1/(NULL) are UP/DOWN [slurm.conf] # slurm.conf # # See the slurm.conf man page for more information. # ClusterName=linux ControlMachine=GO1 ControlAddr=192.168.30.74 # SlurmUser=slurm SlurmctldPort=6817 SlurmdPort=6818 AuthType=auth/munge StateSaveLocation=/var/lib/slurmd SlurmdSpoolDir=/var/spool/slurmd SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/var/run/slurmd/slurmctld.pid SlurmdPidFile=/var/run/slurmd/slurmd.pid ProctrackType=proctrack/pgid ReturnToService=0 TreeWidth=50 # # TIMERS SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 # # SCHEDULING SchedulerType=sched/backfill FastSchedule=1 # # LOGGING SlurmctldDebug=7 SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=7 SlurmdLogFile=/var/log/slurmd.log JobCompType=jobcomp/none # # COMPUTE NODES NodeName=sgo[1-5] NodeHostName=GO[1-5] #NodeAddr=192.168.30.[74,141,68,70,72] # # PARTITIONS PartitionName=party Default=yes Nodes=ALL