Re: [slurm-users] New slurm configuration - multiple jobs per host

Ole Holm Nielsen Thu, 26 May 2022 13:10:46 -0700

Hi Jake,

Firstly, which Slurm version and which OS do you use?

Next, try simplifying by removing the oversubscribe configuration. Readthe slurm.conf manual page about oversubscribe, it looks a bit tricky.

The RealMemory=1000 is extremely low and might prevent jobs fromstarting! Run "slurmd -C" on the nodes to read appropriate nodeparameters for slurm.conf.


I hope this helps.

/Ole


On 26-05-2022 21:12, Jake Jellinek wrote:

Hi

I am just building my first Slurm setup and have got everything running– well, almost.

I have a two node configuration. All of my setup exists on a singleHyperV server and I have divided up the resources to create my VMs


One node I will use for heavy duty work; this is called compute001

One node I will use for normal work; this is called compute002

My compute node specification in slurm.conf is

NodeName=DEFAULT CPUs=1 RealMemory=1000 State=UNKNOWN

NodeName=compute001 CPUs=32

NodeName=compute002 CPUs=2

The partition specification is

PartitionName=DEFAULT State=UP

PartitionName=interactive Nodes=compute002 MaxTime=INFINITEOverSubscribe=FORCE


PartitionName=simulation Nodes=compute001 MaxTime=30 OverSubscribe=FORCE

I have added the OverSubscribe=FORCE option as I want more than one jobto be able to land on my interactive/simulation queues.

All of the nodes and cluster master start up fine and they all talk toeach other but no matter what I do, I cannot get my cluster to acceptmore than one job per node.


Can you help me determine where I am going wrong?

Thanks a lot

Jake

The entire slurm.conf is pasted below

# slurm.conf file generated by configurator.html.

ClusterName=pm-slurm

SlurmctldHost=slurm-master

MpiDefault=none

ProctrackType=proctrack/cgroup

ReturnToService=2

SlurmctldPidFile=/var/run/slurmctld.pid

SlurmctldPort=6817

SlurmdPidFile=/var/run/slurmd.pid

SlurmdPort=6818

SlurmdSpoolDir=/var/spool/slurmd

SlurmUser=slurm

StateSaveLocation=/home/slurm/var/spool/slurmctld

SwitchType=switch/none

TaskPlugin=task/cgroup

#

# TIMERS

InactiveLimit=0

KillWait=30

MinJobAge=300

SlurmctldTimeout=120

SlurmdTimeout=300

Waittime=0

#

# SCHEDULING

SchedulerType=sched/backfill

SelectType=select/cons_tres

SelectTypeParameters=CR_Core_Memory

#

# LOGGING AND ACCOUNTING

JobAcctGatherFrequency=30

JobAcctGatherType=jobacct_gather/cgroup

SlurmctldDebug=info

SlurmctldLogFile=/var/log/slurmctld.log

SlurmdDebug=info

SlurmdLogFile=/var/log/slurmd.log

# COMPUTE NODES

NodeName=DEFAULT CPUs=1 RealMemory=1000 State=UNKNOWN

NodeName=compute001 CPUs=32

NodeName=compute002 CPUs=2

PartitionName=DEFAULT State=UP

PartitionName=interactive Nodes=compute002 MaxTime=INFINITEOverSubscribe=FORCE


PartitionName=simulation Nodes=compute001 MaxTime=30 OverSubscribe=FORCE

Re: [slurm-users] New slurm configuration - multiple jobs per host

Reply via email to