If you still have a problem, run a job at specific node like  [ srun 
--nodelist=go2 hostname ] and if the command was not successful check the 
corresponding log file for any errors.


cheers,


Said.

________________________________
From: Lachlan Musicman <[email protected]>
Sent: Friday, July 28, 2017 10:02:57 AM
To: slurm-dev
Subject: [slurm-dev] Re: Why my slurm is running on only one node?

Ok! Good, so the servers are there.

You should expect to see output from

srun -w go2 hostname

alternatively you should get a diff hostname if you run

srun --time=0-06:00 --mem=8gb "$@" --pty -u bash -i

for instance.

Try running some stress test with >1 node and #cpus>(#cpus on single node) in 
request, that should show multiple nodes. Hopefully.

cheers
L.




------
"The antidote to apocalypticism is apocalyptic civics. Apocalyptic civics is 
the insistence that we cannot ignore the truth, nor should we panic about it. 
It is a shared consciousness that our institutions have failed and our 
ecosystem is collapsing, yet we are still here — and we are creative agents who 
can shape our destinies. Apocalyptic civics is the conviction that the only way 
out is through, and the only way through is together. "

Greg Bloom @greggish https://twitter.com/greggish/status/873177525903609857

On 28 July 2017 at 10:57, 허웅 
<[email protected]<mailto:[email protected]>> wrote:

Here is my output of sinfo



[root@GO1]~# sinfo -N

NODELIST   NODES PARTITION STATE

sgo1           1    party* idle

sgo2           1    party* idle

sgo3           1    party* idle

sgo4           1    party* idle

sgo5           1    party* idle

[root@GO1]~# sn
Fri Jul 28 09:55:53 2017
           HOSTNAMES
                 GO1
                 GO2
                 GO3
                 GO4
                 GO5




-----Original Message-----
From: "Lachlan Musicman"<[email protected]<mailto:[email protected]>>
To: "slurm-dev"<[email protected]<mailto:[email protected]>>;
Cc:
Sent: 2017-07-28 (금) 09:51:40
Subject: [slurm-dev] Re: Why my slurm is running on only one node?


Also - are the nodes up an running wrt SLURM? What is the output of :

sinfo -N

?

(fwiw, I really like the alias sn="sinfo -Nle -o "%.20n %.15C %.8O %.7t" | 
uniq" )

cheers
L.

------
"The antidote to apocalypticism is apocalyptic civics. Apocalyptic civics is 
the insistence that we cannot ignore the truth, nor should we panic about it. 
It is a shared consciousness that our institutions have failed and our 
ecosystem is collapsing, yet we are still here — and we are creative agents who 
can shape our destinies. Apocalyptic civics is the conviction that the only way 
out is through, and the only way through is together. "

Greg Bloom @greggish https://twitter.com/greggish/status/873177525903609857

On 28 July 2017 at 10:47, Lachlan Musicman 
<[email protected]<mailto:[email protected]>> wrote:
I think it's because hostname is so undemanding.

How many CPUs does each host have?

You may need to use ((number of cpus per host) + 1) to see action on another 
node.

You can try using stress-ng to test higher loads?

https://www.cyberciti.biz/faq/stress-test-linux-unix-server-with-stress-ng/

cheers
L.


------
"The antidote to apocalypticism is apocalyptic civics. Apocalyptic civics is 
the insistence that we cannot ignore the truth, nor should we panic about it. 
It is a shared consciousness that our institutions have failed and our 
ecosystem is collapsing, yet we are still here — and we are creative agents who 
can shape our destinies. Apocalyptic civics is the conviction that the only way 
out is through, and the only way through is together. "

Greg Bloom @greggish https://twitter.com/greggish/status/873177525903609857

On 28 July 2017 at 10:28, 허웅 
<[email protected]<mailto:[email protected]>> wrote:
I have 5 nodes include control node.

and my nodes are looking like this

Control Node : GO1
Compute Nodes : GO[1-5]

when i trying to allocate some job to multiple nodes, only one node works.

example]

$ srun -N5 hostname
GO1
GO1
GO1
GO1
GO1

even I expected like this

$ srun -N5 hostname
GO1
GO2
GO3
GO4
GO5

What should i do?

there are some my configures.

$ scontrol show frontend
FrontendName=GO1 State=IDLE Version=17.02 Reason=(null)
BootTime=2017-06-02T20:14:39 SlurmdStartTime=2017-07-27T16:29:46

FrontendName=GO2 State=IDLE Version=17.02 Reason=(null)
BootTime=2017-07-05T17:54:13 SlurmdStartTime=2017-07-27T16:30:07

FrontendName=GO3 State=IDLE Version=17.02 Reason=(null)
BootTime=2017-07-05T17:22:58 SlurmdStartTime=2017-07-27T16:30:08

FrontendName=GO4 State=IDLE Version=17.02 Reason=(null)
BootTime=2017-07-05T17:21:40 SlurmdStartTime=2017-07-27T16:30:08

FrontendName=GO5 State=IDLE Version=17.02 Reason=(null)
BootTime=2017-07-05T17:21:39 SlurmdStartTime=2017-07-27T16:30:09

$ scontrol ping
Slurmctld(primary/backup) at GO1/(NULL) are UP/DOWN

[slurm.conf]
# slurm.conf
#
# See the slurm.conf man page for more information.
#
ClusterName=linux
ControlMachine=GO1
ControlAddr=192.168.30.74
#
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/lib/slurmd
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmd/slurmctld.pid
SlurmdPidFile=/var/run/slurmd/slurmd.pid
ProctrackType=proctrack/pgid
ReturnToService=0
TreeWidth=50
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#
# SCHEDULING
SchedulerType=sched/backfill
FastSchedule=1
#
# LOGGING
SlurmctldDebug=7
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=7
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
#
# COMPUTE NODES
NodeName=sgo[1-5] NodeHostName=GO[1-5] #NodeAddr=192.168.30.[74,141,68,70,72]
#
# PARTITIONS
PartitionName=party Default=yes Nodes=ALL



[https://mail.naver.com/readReceipt/notify/?img=3f%2BTWX%2B0Wr2spzeTpr%2B5hztXMrtZKrKmp6pCpAtlKxCoKx29Krk4Mxv9aAUlMdIo%2BrkSKx25W4d5W4C5bX0q%2BzkR74FTWx%2Fsbrw9b4dRpzkrtHFo1zkvW6t5MreR.gif]

Reply via email to