[slurm-dev] Re: cgroup release_memory agent creates too many pids

Andrej Filipcic Mon, 01 Oct 2012 04:54:07 -0700

Found out that the release_memory is called many times for the same path 
unlike with the others (cpusets), 4k for 100 jobs.


It seems to work much better if I replace this line:
         flock -x ${mountdir} -c "$0 sync $@"
with
         flock -x -w 2 ${rmcg} -c "$0 sync $@"

So, locking on the directory to be removed. I am not sure if this has 
any side effects... But at least, there is no excessive number of 
processes created and the memory cgroup tree is cleaned properly after 
all the jobs finish.

Cheers,
Andrej

On 09/30/2012 01:19 PM, Andrej Filipcic wrote:
> Hi,
>
> On 64-core nodes while submitting many short jobs, the number of calls
> to release_memory agent (symlink to release_common from slurm 2.4.3
> release) can be extremely high. It seems that the script is too slow for
> memory, which results in few 10k agent processes being spawned in a
> short time after job completion, and the processes stay alive for a long
> time. In extreme cases, the pid numbers can be exhausted preventing new
> processes being spawned. To fix it partially, I had commented the "sleep
> 1" in the sync part of the script. But there can still be up to few k
> processes after 64 jobs complete in roughly the same time.
>
> Each job has about 10 processes, so the number of agent calls can be high.
>
> I did not notice that on the nodes with lower no of cores/jobs, and the
> problem is not present for other cgroups.
>
> Any advice how to fix this problem?
>
> Cheers,
> Andrej
>


-- 
_____________________________________________________________
    prof. dr. Andrej Filipcic,   E-mail: [email protected]
    Department of Experimental High Energy Physics - F9
    Jozef Stefan Institute, Jamova 39, P.o.Box 3000
    SI-1001 Ljubljana, Slovenia
    Tel.: +386-1-477-3674    Fax: +386-1-477-3166
-------------------------------------------------------------

[slurm-dev] Re: cgroup release_memory agent creates too many pids

Reply via email to