I actually meant to send this to my local tech support group and accidentally did not change the recipient, but I’m now glad I did as that is also useful information. Sorry for resurrecting the dead thread though.
Thanks, Oliver! > On Apr 21, 2017, at 13:54, Oliver Freyermuth <o.freyerm...@googlemail.com> > wrote: > > > This is an interesting information indeed! > > However, I might add our experience to this. We have been using: > ProctrackType=proctrack/linuxproc > TaskPlugin=task/none > on a simple SLURM cluster of desktop machines, which also have (slow, > HDD-based) swap partitions. > > From my experience, it seems that "linuxproc" actually enforces the memory > limit by "polling" procfs regularly, > and killing the job if the limit is exceeded (as I would also expect). > This leads to a problem if a user submits a job which allocates a huge amount > of memory exceeding the foreseen memory limit quickly, i.e. in between the > "polling" intervals. > > In our case, this led to heavy swapping of the desktop machines slowing them > down to crawl before slurm could kill those jobs. Of course, this is even > worse if a user submits a full job array showing such nasty behaviour. > So I would still consider the cgroup enforcement much more safe from the > cluster operator point of view, at least if you have users developing custom > code (and them not giving it a good testing beforehand). > > Cheers, > Oliver > > Am 22.01.2016 um 11:33 schrieb Felip Moll: >> Finally I solved the issue in big part thanks to Carlos Fenoy tips. >> >> The issue was due to NFS filesystem. This filesystem, as CF said, caches >> data while other file systems does not. Cgroups takes into account cached >> data and our user jobs use NFS filesystem intensivelly. >> >> I switched from: >> ProctrackType=proctrack/cgroup >> TaskPlugin=task/cgroup >> TaskPluginParam= >> >> To: >> ProctrackType=proctrack/linuxproc >> TaskPlugin=task/affinity >> TaskPluginParam=Sched >> >> >> And in the following 11 days I didn't receive a single oom kill and >> everythink is working perfectly. >> >> Best regards and thanks to all of you. >> Felip M >> >> >> * >> -- >> Felip Moll Marquès* >> Computer Science Engineer >> E-Mail - lip...@gmail.com <mailto:lip...@gmail.com> >> WebPage - http://lipix.ciutadella.es >> >> 2015-12-18 15:09 GMT+01:00 Bjørn-Helge Mevik <b.h.me...@usit.uio.no >> <mailto:b.h.me...@usit.uio.no>>: >> >> >> Carlos Fenoy <mini...@gmail.com <mailto:mini...@gmail.com>> writes: >> >>> Barbara, I don't think that is the issue here. The killer is the OOM not >>> Slurm, so Slurm is not accounting incorrectly the amount of memory, but it >>> seems that the cached memory is also accounted in the cgroup and it is what >>> is causing the OOM to kill gzip. >> >> I've seen cases where the job has copied a set of large files, which >> makes the cgroup memory usage go right up to the limit. I guess that is >> cached data. Then the job starts computing, without the job getting >> killed. My interpretatin is that the kernel will flush the cache when a >> process needs more memory instead of killing the process. If I'm >> correct, oom will _not_ kill a job due to cached data. >> >> -- >> Regards, >> Bjørn-Helge Mevik, dr. scient, >> Department for Research Computing, University of Oslo