Thanks for extensive info. In the meantime, I had disabled task/affinity, and I am using only task/cgroup. Much lower number of release_agent calls. Waiting for the new development then...
Best regards, Andrej On 10/10/2012 02:39 PM, Matthieu Hautreux wrote: > Hi, > > the locking that you have removed is necessary to ensure the proper > behavior of the cgroup directory creation. > It could result in the memory cgroup plugin no longer working as > expected and some jobs or job steps no being ran in a memory cgroup at > all. > > This is mostly due to the fact that the cgroup directory hierarchy > (uid/job_id/step_id) is automatically removed by the release agent > mechanism of the cgroup and not directly by the cgroup logic of SLURM. > As a result, when creating a new step, you can have situation where > you check that the job directory is present and then add the step > directory but in the meantime, a release agent has removed the job dir > and this creation failed. To avoid that, the flock of the cgroup > subsystem root directory was introduced. This logic was not designed > with "high throughput" computing in mind and so it does not really > work with your workload. > > Mark Grondonna has added the ability to remove the step level cgroup > directory directly in the SLURM logic in slurm-2.4.x and I have worked > also on applying the same logic for both the job and the user level of > the hierarchy but it is not yet included in any official version of > SLURM. I will work on that again and hope to have something working > better for slurm-2.5 (most probably for november according to > schedmd). I hope that the speedup will be sufficient for you. > > In the meantime, I would suggest to no longer use the cgroup memory > logic if you experiment the issue I mentionned at the beginning of > this email. > > Best regards, > Matthieu > > > > > 2012/10/1 Andrej Filipcic <[email protected]>: >> Found out that the release_memory is called many times for the same path >> unlike with the others (cpusets), 4k for 100 jobs. >> >> It seems to work much better if I replace this line: >> flock -x ${mountdir} -c "$0 sync $@" >> with >> flock -x -w 2 ${rmcg} -c "$0 sync $@" >> >> So, locking on the directory to be removed. I am not sure if this has >> any side effects... But at least, there is no excessive number of >> processes created and the memory cgroup tree is cleaned properly after >> all the jobs finish. >> >> Cheers, >> Andrej >> >> On 09/30/2012 01:19 PM, Andrej Filipcic wrote: >>> Hi, >>> >>> On 64-core nodes while submitting many short jobs, the number of calls >>> to release_memory agent (symlink to release_common from slurm 2.4.3 >>> release) can be extremely high. It seems that the script is too slow for >>> memory, which results in few 10k agent processes being spawned in a >>> short time after job completion, and the processes stay alive for a long >>> time. In extreme cases, the pid numbers can be exhausted preventing new >>> processes being spawned. To fix it partially, I had commented the "sleep >>> 1" in the sync part of the script. But there can still be up to few k >>> processes after 64 jobs complete in roughly the same time. >>> >>> Each job has about 10 processes, so the number of agent calls can be high. >>> >>> I did not notice that on the nodes with lower no of cores/jobs, and the >>> problem is not present for other cgroups. >>> >>> Any advice how to fix this problem? >>> >>> Cheers, >>> Andrej >>> >> >> -- >> _____________________________________________________________ >> prof. dr. Andrej Filipcic, E-mail: [email protected] >> Department of Experimental High Energy Physics - F9 >> Jozef Stefan Institute, Jamova 39, P.o.Box 3000 >> SI-1001 Ljubljana, Slovenia >> Tel.: +386-1-477-3674 Fax: +386-1-477-3166 >> ------------------------------------------------------------- -- _____________________________________________________________ prof. dr. Andrej Filipcic, E-mail: [email protected] Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674 Fax: +386-1-425-7074 -------------------------------------------------------------
