Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-05-01 Thread Nate Coraor
, jobs running longer than 30 minutes are completing, and cgroups are persisting, whereas before that, they were not. --nate On Mon, Apr 30, 2018 at 5:47 PM, Andy Georges <andy.geor...@ugent.be> wrote: > > > > On 30 Apr 2018, at 22:37, Nate Coraor <n...@bx.psu.edu>

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Nate Coraor
, Apr 30, 2018 at 4:37 PM, Nate Coraor <n...@bx.psu.edu> wrote: > Hi Shawn, > > I'm wondering if you're still seeing this. I've recently enabled > task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs > are escaping their cgroups. For me this is resul

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Nate Coraor
Hi Shawn, I'm wondering if you're still seeing this. I've recently enabled task/cgroup on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping their cgroups. For me this is resulting in a lot of jobs ending in OUT_OF_MEMORY that shouldn't, because it appears slurmd thinks the