
Just to give you a compare and contrast:

We have for related entries slurm.conf

JobAcctGatherType=jobacct_gather/linux # will migrate to cgroup eventually



gres.conf (4 K80s on node with 24 core haswell):

Name=gpu File=/dev/nvidia0 CPUs=0-5
Name=gpu File=/dev/nvidia1 CPUs=12-17
Name=gpu File=/dev/nvidia2 CPUs=6-11
Name=gpu File=/dev/nvidia3 CPUs=18-23

I also looked for multi-tenant jobs on our MARCC cluster with jobs > 1 day and 
they are still inside of cgroups, but again this is on CentOS6 clusters.

Are you still seeing  cgroup escapes now, specifically for jobs > 1 day?


From: slurm-users <> on behalf of Shawn 
Bobbin <>
Reply-To: Slurm User Community List <>
Date: Monday, April 23, 2018 at 2:45 PM
To: Slurm User Community List <>
Subject: Re: [slurm-users] Jobs escaping cgroup device controls after some 
amount of time.


I attached our cgroup.conf and gres.conf.

As for the cgroup_allowed_devices.conf file, I have this file stubbed but 
empty.  In 17.02 slurm started fine without this file (as far as I remember) 
and it being empty doesn’t appear to actually impact anything… device 
availability remains the same.  Based on the behavior explained in [0] I don’t 
expect this file to impact specific GPU containment.

TaskPlugin = task/cgroup
ProctrackType = proctrack/cgroup
JobAcctGatherType = jobacct_gather/cgroup


Reply via email to