Hi!
I have configured SLURM 14.11.7 on our cluster to share node resources:
from slurm.conf:
NodeName=taurus Sockets=1 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN
Weight=100
NodeName=node[2-6] Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN
Weight=20
NodeName=node[7-8] Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN
Weight=50
PartitionName=main Nodes=taurus,node[2-8] Default=YES MaxTime=7-0 State=UP
Shared=FORCE
PartitionName=gpu Nodes=node[7-8] Default=NO MaxTime=7-0 State=UP Shared=NO
sinfo output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
main* up 7-00:00:00 1 down* node6
main* up 7-00:00:00 4 alloc node[2-5]
main* up 7-00:00:00 3 idle node[7-8],taurus
gpu up 7-00:00:00 2 idle node[7-8]
squeue output:
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
39 main z20_550_ fadeyev R 19:13:22 1 node2
40 main z20_550_ fadeyev R 19:12:51 1 node3
41 main z20_550_ fadeyev R 19:12:15 1 node4
42 main z20_550_ fadeyev R 19:11:40 1 node5
44 main z20_560_ fadeyev R 17:58:17 1 node2
45 main z20_560_ fadeyev R 12:19:48 1 node3
46 main z20_560_ fadeyev R 12:01:22 1 node4
47 main z20_570_ fadeyev R 11:42:32 1 node5
48 main z20_570_ fadeyev R 6:54:08 1 node2
49 main z20_570_ fadeyev R 6:41:58 1 node3
50 main z20_570_ fadeyev R 6:27:04 1 node4
61 main Run.sh akimkin R 19:31 1 node5
If I'm trying to run multithreaded task, which must utilize all of 32 available
threads on the node, the task starting on node2, where some of other tasks
already started:
-bash-4.1$ srun -c 32 -N 1-1 -t 1 ./t1
node2.taurus
Why SLURM chose node2, which has less then 32 free thread slots?
Pavel.