[slurm-dev] Re: cgroup release_memory agent creates too many pids

Andrej Filipcic Wed, 10 Oct 2012 10:37:06 -0700


Thanks for extensive info. In the meantime, I had disabled
task/affinity, and I am using only task/cgroup. Much lower number of
release_agent calls. Waiting for the new development then...


Best regards,
Andrej

On 10/10/2012 02:39 PM, Matthieu Hautreux wrote:
> Hi,
>
> the locking that you have removed is necessary to ensure the proper
> behavior of the cgroup directory creation.
> It could result in the memory cgroup plugin no longer working as
> expected and some jobs or job steps no being ran in a memory cgroup at
> all.
>
> This is mostly due to the fact that the cgroup directory hierarchy
> (uid/job_id/step_id) is automatically removed by the release agent
> mechanism of the cgroup and not directly by the cgroup logic of SLURM.
> As a result, when creating a new step, you can have situation where
> you check that the job directory is present and then add the step
> directory but in the meantime, a release agent has removed the job dir
> and this creation failed. To avoid that, the flock of the cgroup
> subsystem root directory was introduced. This logic was not designed
> with "high throughput" computing in mind and so it does not really
> work with your workload.
>
> Mark Grondonna has added the ability to remove the step level cgroup
> directory directly in the SLURM logic in slurm-2.4.x and I have worked
> also on applying the same logic for both the job and the user level of
> the hierarchy but it is not yet included in any official version of
> SLURM. I will work on that again and hope to have something working
> better for slurm-2.5 (most probably for november according to
> schedmd). I hope that the speedup will be sufficient for you.
>
> In the meantime, I would suggest to no longer use the cgroup memory
> logic if you experiment the issue I mentionned at the beginning of
> this email.
>
> Best regards,
> Matthieu
>
>
>
>
> 2012/10/1 Andrej Filipcic <[email protected]>:
>> Found out that the release_memory is called many times for the same path
>> unlike with the others (cpusets), 4k for 100 jobs.
>>
>> It seems to work much better if I replace this line:
>>          flock -x ${mountdir} -c "$0 sync $@"
>> with
>>          flock -x -w 2 ${rmcg} -c "$0 sync $@"
>>
>> So, locking on the directory to be removed. I am not sure if this has
>> any side effects... But at least, there is no excessive number of
>> processes created and the memory cgroup tree is cleaned properly after
>> all the jobs finish.
>>
>> Cheers,
>> Andrej
>>
>> On 09/30/2012 01:19 PM, Andrej Filipcic wrote:
>>> Hi,
>>>
>>> On 64-core nodes while submitting many short jobs, the number of calls
>>> to release_memory agent (symlink to release_common from slurm 2.4.3
>>> release) can be extremely high. It seems that the script is too slow for
>>> memory, which results in few 10k agent processes being spawned in a
>>> short time after job completion, and the processes stay alive for a long
>>> time. In extreme cases, the pid numbers can be exhausted preventing new
>>> processes being spawned. To fix it partially, I had commented the "sleep
>>> 1" in the sync part of the script. But there can still be up to few k
>>> processes after 64 jobs complete in roughly the same time.
>>>
>>> Each job has about 10 processes, so the number of agent calls can be high.
>>>
>>> I did not notice that on the nodes with lower no of cores/jobs, and the
>>> problem is not present for other cgroups.
>>>
>>> Any advice how to fix this problem?
>>>
>>> Cheers,
>>> Andrej
>>>
>>
>> --
>> _____________________________________________________________
>>     prof. dr. Andrej Filipcic,   E-mail: [email protected]
>>     Department of Experimental High Energy Physics - F9
>>     Jozef Stefan Institute, Jamova 39, P.o.Box 3000
>>     SI-1001 Ljubljana, Slovenia
>>     Tel.: +386-1-477-3674    Fax: +386-1-477-3166
>> -------------------------------------------------------------


-- 
_____________________________________________________________
   prof. dr. Andrej Filipcic,   E-mail: [email protected]
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-425-7074
-------------------------------------------------------------

[slurm-dev] Re: cgroup release_memory agent creates too many pids

Reply via email to