I actually meant to send this to my local tech support group and accidentally 
did not change the recipient, but I’m now glad I did as that is also useful 
information. Sorry for resurrecting the dead thread though.

Thanks, Oliver!

> On Apr 21, 2017, at 13:54, Oliver Freyermuth <o.freyerm...@googlemail.com> 
> wrote:
> 
> 
> This is an interesting information indeed! 
> 
> However, I might add our experience to this. We have been using:
> ProctrackType=proctrack/linuxproc
> TaskPlugin=task/none
> on a simple SLURM cluster of desktop machines, which also have (slow, 
> HDD-based) swap partitions. 
> 
> From my experience, it seems that "linuxproc" actually enforces the memory 
> limit by "polling" procfs regularly, 
> and killing the job if the limit is exceeded (as I would also expect). 
> This leads to a problem if a user submits a job which allocates a huge amount 
> of memory exceeding the foreseen memory limit quickly, i.e. in between the 
> "polling" intervals. 
> 
> In our case, this led to heavy swapping of the desktop machines slowing them 
> down to crawl before slurm could kill those jobs. Of course, this is even 
> worse if a user submits a full job array showing such nasty behaviour. 
> So I would still consider the cgroup enforcement much more safe from the 
> cluster operator point of view, at least if you have users developing custom 
> code (and them not giving it a good testing beforehand). 
> 
> Cheers, 
>       Oliver
> 
> Am 22.01.2016 um 11:33 schrieb Felip Moll:
>> Finally I solved the issue in big part thanks to Carlos Fenoy tips.
>> 
>> The issue was due to NFS filesystem. This filesystem, as CF said, caches 
>> data while other file systems does not. Cgroups takes into account cached 
>> data and our user jobs use NFS filesystem intensivelly.
>> 
>> I switched from:
>> ProctrackType=proctrack/cgroup
>> TaskPlugin=task/cgroup
>> TaskPluginParam=
>> 
>> To:
>> ProctrackType=proctrack/linuxproc
>> TaskPlugin=task/affinity
>> TaskPluginParam=Sched
>> 
>> 
>> And in the following 11 days I didn't receive a single oom kill and 
>> everythink is working perfectly.
>> 
>> Best regards and thanks to all of you.
>> Felip M
>> 
>> 
>> *
>> --
>> Felip Moll Marquès*
>> Computer Science Engineer
>> E-Mail - lip...@gmail.com <mailto:lip...@gmail.com>
>> WebPage - http://lipix.ciutadella.es
>> 
>> 2015-12-18 15:09 GMT+01:00 Bjørn-Helge Mevik <b.h.me...@usit.uio.no 
>> <mailto:b.h.me...@usit.uio.no>>:
>> 
>> 
>>    Carlos Fenoy <mini...@gmail.com <mailto:mini...@gmail.com>> writes:
>> 
>>> Barbara, I don't think that is the issue here. The killer is the OOM not
>>> Slurm, so Slurm is not accounting incorrectly the amount of memory, but it
>>> seems that the cached memory is also accounted in the cgroup and it is what
>>> is causing the OOM to kill gzip.
>> 
>>    I've seen cases where the job has copied a set of large files, which
>>    makes the cgroup memory usage go right up to the limit.  I guess that is
>>    cached data.  Then the job starts computing, without the job getting
>>    killed.  My interpretatin is that the kernel will flush the cache when a
>>    process needs more memory instead of killing the process.  If I'm
>>    correct, oom will _not_ kill a job due to cached data.
>> 
>>    --
>>    Regards,
>>    Bjørn-Helge Mevik, dr. scient,
>>    Department for Research Computing, University of Oslo

Reply via email to