Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

Hermann Schwärzler Thu, 13 Jul 2023 03:48:35 -0700

Hi Jenny,

ok, I see. You are using the exact same Slurm version and a very similarOS version/distribution as we do.

You have to consider that cpuset support is not available in cgroup/v2in kernel versions below 5.2 (see "Cgroups v2 controllers" in "mancgroups" on your system). So some of the warnings/errors you see - atleast "Controller cpuset is not enabled" - is expected (and slurmdshould start nevertheless).This btw is one of the reasons why we stick with cgroup/v1 for the timebeing.

We did some tests with cgroups/v2 and in our case slurmd started with noproblems (except the error/warning regarding the cpuset controller). Butwe have a slightly different configuration. You use

JobAcctGatherType       = jobacct_gather/cgroup
ProctrackType           = proctrack/cgroup
TaskPlugin              = cgroup,affinity
CgroupPlugin            = cgroup/v2

We use for the respective settings:
JobAcctGatherType       = jobacct_gather/linux
ProctrackType           = proctrack/cgroup
TaskPlugin              = task/affinity,task/cgroup
CgroupPlugin            = (null) - i.e. we don't set that one in cgroup.conf

Maybe using the same settings as we do helps in your case?

Please be aware that you should change JobAcctGatherType only when thereare no running job steps!


Regards,
Hermann


On 7/12/23 16:50, Williams, Jenny Avis wrote:

The systems have only cgroup/v2 enabled
        # mount |egrep cgroup
        cgroup2 on /sys/fs/cgroup type cgroup2 
(rw,nosuid,nodev,noexec,relatime,nsdelegate)
Distribution and kernel
        RedHat 8.7
        4.18.0-348.2.1.el8_5.x86_64



-----Original Message-----
From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Hermann 
Schwärzler
Sent: Wednesday, July 12, 2023 4:36 AM
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed 
to get daemon to start

Hi Jenny,

I *guess* you have a system that has both cgroup/v1 and cgroup/v2 enabled.

Which Linux distribution are you using? And which kernel version?
What is the output of
    mount | grep cgroup
What if you do not restrict the cgroup-version Slurm can use to
cgroup/v2 but omit "CgroupPlugin=..." from your cgroup.conf?

Regards,
Hermann

On 7/11/23 19:41, Williams, Jenny Avis wrote:

Additional configuration information -- /etc/slurm/cgroup.conf

CgroupAutomount=yes

ConstrainCores=yes

ConstrainRAMSpace=yes

CgroupPlugin=cgroup/v2

AllowedSwapSpace=1

ConstrainSwapSpace=yes

ConstrainDevices=yes

*From:* Williams, Jenny Avis
*Sent:* Tuesday, July 11, 2023 10:47 AM
*To:* slurm-us...@schedmd.com
*Subject:* cgroupv2 + slurmd - external cgroup changes needed to get
daemon to start

Progress on getting slurmd to start under cgroupv2

Issue: slurmd 22.05.6 will not start when using cgroupv2

Expected result: even after reboot slurmd will start up without
needing to manually add lines to /sys/fs/cgroup files.

When started as service the error is:

# systemctl status slurmd

* slurmd.service - Slurm node daemon

     Loaded: loaded (/usr/lib/systemd/system/slurmd.service; enabled;
vendor preset: disabled)

    Drop-In: /etc/systemd/system/slurmd.service.d

             `-extendUnit.conf

     Active: failed (Result: exit-code) since Tue 2023-07-11 10:29:23
EDT; 2s ago

    Process: 11395 ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS
(code=exited, status=1/FAILURE)

Main PID: 11395 (code=exited, status=1/FAILURE)

Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: Started Slurm node
daemon.

Jul 11 10:29:23 g1803jles01.ll.unc.edu slurmd[11395]: slurmd: slurmd
version 22.05.6 started

Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service:
Main process exited, code=exited, status=1/FAILURE

Jul 11 10:29:23 g1803jles01.ll.unc.edu systemd[1]: slurmd.service:
Failed with result 'exit-code'.

When started at the command line the output is:

# slurmd -D -vvv 2>&1 |egrep error

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: Controller cpuset is not enabled!

slurmd: error: Controller cpu is not enabled!

slurmd: error: cpu cgroup controller is not available.

slurmd: error: There's an issue initializing memory or cpu controller

slurmd: error: Couldn't load specified plugin name for
jobacct_gather/cgroup: Plugin init() callback failed

slurmd: error: cannot create jobacct_gather context for
jobacct_gather/cgroup

Steps to mitigate the issue:

While the following steps do not solve the issue, they do get the
system in a state such that slurmd will start, at least until next
reboot.  The re-install slurm-slurmd is a one-time step to ensure that
local service modifications are out of the picture. */Currently, even
after reboot the cgroup echo steps are necessary at a minimum./*

#!/bin/bash

/usr/bin/dnf -y reinstall slurm-slurmd

systemctl daemon-reload

/usr/bin/pkill -f '/usr/sbin/slurmstepd infinity'

systemctl enable slurmd

systemctl stop dcismeng.service && \

*//usr/bin/echo +cpu +cpuset +memory >>
/sys/fs/cgroup/cgroup.subtree_control && \/*

*//usr/bin/echo +cpu +cpuset +memory >>
/sys/fs/cgroup/system.slice/cgroup.subtree_control && \/*

systemctl start slurmd && \

   echo 'run this: systemctl start dcismeng'

Environment:

# scontrol show config

Configuration data as of 2023-07-11T10:39:48

AccountingStorageBackupHost = (null)

AccountingStorageEnforce = associations,limits,qos,safe

AccountingStorageHost   = m1006

AccountingStorageExternalHost = (null)

AccountingStorageParameters = (null)

AccountingStoragePort   = 6819

AccountingStorageTRES   =
cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu

AccountingStorageType   = accounting_storage/slurmdbd

AccountingStorageUser   = N/A

AccountingStoreFlags    = (null)

AcctGatherEnergyType    = acct_gather_energy/none

AcctGatherFilesystemType = acct_gather_filesystem/none

AcctGatherInterconnectType = acct_gather_interconnect/none

AcctGatherNodeFreq      = 0 sec

AcctGatherProfileType   = acct_gather_profile/none

AllowSpecResourcesUsage = No

AuthAltTypes            = (null)

AuthAltParameters       = (null)

AuthInfo                = (null)

AuthType                = auth/munge

BatchStartTimeout       = 10 sec

BcastExclude            = /lib,/usr/lib,/lib64,/usr/lib64

BcastParameters         = (null)

BOOT_TIME               = 2023-07-11T10:04:31

BurstBufferType         = (null)

CliFilterPlugins        = (null)

ClusterName             = ASlurmCluster

CommunicationParameters = (null)

CompleteWait            = 0 sec

CoreSpecPlugin          = core_spec/none

CpuFreqDef              = Unknown

CpuFreqGovernors        = OnDemand,Performance,UserSpace

CredType                = cred/munge

DebugFlags              = (null)

DefMemPerNode           = UNLIMITED

DependencyParameters    = kill_invalid_depend

DisableRootJobs         = No

EioTimeout              = 60

EnforcePartLimits       = ANY

Epilog                  = (null)

EpilogMsgTime           = 2000 usec

EpilogSlurmctld         = (null)

ExtSensorsType          = ext_sensors/none

ExtSensorsFreq          = 0 sec

FairShareDampeningFactor = 1

FederationParameters    = (null)

FirstJobId              = 1

GetEnvTimeout           = 2 sec

GresTypes               = gpu

GpuFreqDef              = high,memory=high

GroupUpdateForce        = 1

GroupUpdateTime         = 600 sec

HASH_VAL                = Match

HealthCheckInterval     = 0 sec

HealthCheckNodeState    = ANY

HealthCheckProgram      = (null)

InactiveLimit           = 65533 sec

InteractiveStepOptions  = --interactive --preserve-env --pty $SHELL

JobAcctGatherFrequency  = task=15

JobAcctGatherType       = jobacct_gather/cgroup

JobAcctGatherParams     = (null)

JobCompHost             = localhost

JobCompLoc              = /var/log/slurm_jobcomp.log

JobCompPort             = 0

JobCompType             = jobcomp/none

JobCompUser             = root

JobContainerType        = job_container/none

JobCredentialPrivateKey = (null)

JobCredentialPublicCertificate = (null)

JobDefaults             = (null)

JobFileAppend           = 0

JobRequeue              = 1

JobSubmitPlugins        = lua

KillOnBadExit           = 0

KillWait                = 30 sec

LaunchParameters        = (null)

LaunchType              = launch/slurm

Licenses                = mplus:1,nonmem:32

LogTimeFormat           = iso8601_ms

MailDomain              = (null)

MailProg                = /bin/mail

MaxArraySize            = 90001

MaxDBDMsgs              = 701360

MaxJobCount             = 350000

MaxJobId                = 67043328

MaxMemPerNode           = UNLIMITED

MaxNodeCount            = 340

MaxStepCount            = 40000

MaxTasksPerNode         = 512

MCSPlugin               = mcs/none

MCSParameters           = (null)

MessageTimeout          = 60 sec

MinJobAge               = 300 sec

MpiDefault              = none

MpiParams               = (null)

NEXT_JOB_ID             = 12286313

NodeFeaturesPlugins     = (null)

OverTimeLimit           = 0 min

PluginDir               = /usr/lib64/slurm

PlugStackConfig         = (null)

PowerParameters         = (null)

PowerPlugin             =

PreemptMode             = OFF

PreemptType             = preempt/none

PreemptExemptTime       = 00:00:00

PrEpParameters          = (null)

PrEpPlugins             = prep/script

PriorityParameters      = (null)

PrioritySiteFactorParameters = (null)

PrioritySiteFactorPlugin = (null)

PriorityDecayHalfLife   = 14-00:00:00

PriorityCalcPeriod      = 00:05:00

PriorityFavorSmall      = No

PriorityFlags           =
SMALL_RELATIVE_TO_TIME,CALCULATE_RUNNING,MAX_TRES

PriorityMaxAge          = 60-00:00:00

PriorityUsageResetPeriod = NONE

PriorityType            = priority/multifactor

PriorityWeightAge       = 10000

PriorityWeightAssoc     = 0

PriorityWeightFairShare = 10000

PriorityWeightJobSize   = 1000

PriorityWeightPartition = 1000

PriorityWeightQOS       = 1000

PriorityWeightTRES      = CPU=1000,Mem=4000,GRES/gpu=3000

PrivateData             = none

ProctrackType           = proctrack/cgroup

Prolog                  = (null)

PrologEpilogTimeout     = 65534

PrologSlurmctld         = (null)

PrologFlags             = Alloc,Contain,X11

PropagatePrioProcess    = 0

PropagateResourceLimits = ALL

PropagateResourceLimitsExcept = (null)

RebootProgram           = /usr/sbin/reboot

ReconfigFlags           = (null)

RequeueExit             = (null)

RequeueExitHold         = (null)

ResumeFailProgram       = (null)

ResumeProgram           = (null)

ResumeRate              = 300 nodes/min

ResumeTimeout           = 60 sec

ResvEpilog              = (null)

ResvOverRun             = 0 min

ResvProlog              = (null)

ReturnToService         = 2

RoutePlugin             = route/default

SchedulerParameters     =
batch_sched_delay=10,bf_continue,bf_max_job_part=1000,bf_max_job_test=
10000,bf_max_job_user=100,bf_resolution=300,bf_window=10080,bf_yield_i
nterval=1000000,default_queue_depth=1000,partition_job_depth=600,sched
_min_interval=20000000,defer,max_rpc_cnt=80

SchedulerTimeSlice      = 30 sec

SchedulerType           = sched/backfill

ScronParameters         = (null)

SelectType              = select/cons_tres

SelectTypeParameters    = CR_CPU_MEMORY

SlurmUser               = slurm(47)

SlurmctldAddr           = (null)

SlurmctldDebug          = info

SlurmctldHost[0]        = ASlurmCluster-sched(x.x.x.x)

SlurmctldLogFile        = /data/slurm/slurmctld.log

SlurmctldPort           = 6820-6824

SlurmctldSyslogDebug    = (null)

SlurmctldPrimaryOffProg = (null)

SlurmctldPrimaryOnProg  = (null)

SlurmctldTimeout        = 6000 sec

SlurmctldParameters     = (null)

SlurmdDebug             = info

SlurmdLogFile           = /var/log/slurm/slurmd.log

SlurmdParameters        = (null)

SlurmdPidFile           = /var/run/slurmd.pid

SlurmdPort              = 6818

SlurmdSpoolDir          = /var/spool/slurmd

SlurmdSyslogDebug       = (null)

SlurmdTimeout           = 600 sec

SlurmdUser              = root(0)

SlurmSchedLogFile       = (null)

SlurmSchedLogLevel      = 0

SlurmctldPidFile        = /var/run/slurmctld.pid

SlurmctldPlugstack      = (null)

SLURM_CONF              = /etc/slurm/slurm.conf

SLURM_VERSION           = 22.05.6

SrunEpilog              = (null)

SrunPortRange           = 0-0

SrunProlog              = (null)

StateSaveLocation       = /data/slurm/slurmctld

SuspendExcNodes         = (null)

SuspendExcParts         = (null)

SuspendProgram          = (null)

SuspendRate             = 60 nodes/min

SuspendTime             = INFINITE

SuspendTimeout          = 30 sec

SwitchParameters        = (null)

SwitchType              = switch/none

TaskEpilog              = (null)

TaskPlugin              = cgroup,affinity

TaskPluginParam         = (null type)

TaskProlog              = (null)

TCPTimeout              = 2 sec

TmpFS                   = /tmp

TopologyParam           = (null)

TopologyPlugin          = topology/none

TrackWCKey              = No

TreeWidth               = 50

UsePam                  = No

UnkillableStepProgram   = (null)

UnkillableStepTimeout   = 600 sec

VSizeFactor             = 0 percent

WaitTime                = 0 sec

X11Parameters           = home_xauthority

Cgroup Support Configuration:

AllowedKmemSpace        = (null)

AllowedRAMSpace         = 100.0%

AllowedSwapSpace        = 1.0%

CgroupAutomount         = yes

CgroupMountpoint        = /sys/fs/cgroup

CgroupPlugin            = cgroup/v2

ConstrainCores          = yes

ConstrainDevices        = yes

ConstrainKmemSpace      = no

ConstrainRAMSpace       = yes

ConstrainSwapSpace      = yes

IgnoreSystemd           = no

IgnoreSystemdOnFailure  = no

MaxKmemPercent          = 100.0%

MaxRAMPercent           = 100.0%

MaxSwapPercent          = 100.0%

MemorySwappiness        = (null)

MinKmemSpace            = 30 MB

MinRAMSpace             = 30 MB

Slurmctld(primary) at ASlurmCluster-sched is UP

Re: [slurm-users] cgroupv2 + slurmd - external cgroup changes needed to get daemon to start

Reply via email to