[slurm-dev] Re: RLIMIT_DATA effectively a no-op on Linux

Janne Blomqvist Mon, 22 Jul 2013 01:34:57 -0700

On 2013-07-20T15:23:41 EEST, Chris Samuel wrote:
>
> Hi there,
>
> On Sat, 20 Jul 2013 02:53:52 AM Bjørn-Helge Mevik wrote:
>
>> With the recent changes in glibc in how virtual memory is allocated for
>> threaded applications, limiting virtual memory usage for threaded
>> applications is IMO not a good idea.  (One example: our slurcltd has
>> allocated 16.1 GiB virtual memory, but is only using 104 MiB resident.)
>
> Would you have a pointer to these changes please?


 From a recent message by yours truly to a slurm-dev thread about 
slurmctld memory consumption:

"""
Yes, this is what we're seeing as well. 6.5 GiB VMEM, 376 MB RSS. The
change was that as of glibc 2.10 a more scalable malloc() implementation
is used. The new implementation creates up to 8 (2 on 32-bit) pools per
core, each 64 MB in size. Thus in our case, where slurmctld runs on a
machine with 12 cores, we have up to 12*8*64=6144 MB in those malloc 
pools.


See http://udrepper.livejournal.com/20948.html
"""

I would go even further than Bjørn-Helge and claim that limiting 
virtual memory is, in general, the wrong thing to do. Address space is 
essentially free and doesn't impact other applications so IMHO the 
workload manager has no business limiting that. The glibc malloc() 
behavior being just one situation where trying to limit virtual memory 
goes wrong. There are other situations where allocating lots of virtual 
memory is common. E.g. garbage collected runtimes such as Java often 
allocate huge heaps to use as the garbage collection arena but only a 
small fraction of that is actually used.

>> I would suggest looking at cgroups for limiting memory usage.
>
> Unfortunately cgroups doesn't limit usage (i.e. cause malloc() to fail should
> it have reached its limit); if I understand it correctly it just invokes the
> OOM killer on a candidate process within the cgroup once the limit is reached.
> :-(

Yes, that's my understanding as well. On the "positive" side, few 
applications can sensibly handle malloc() failures anyway. Often the 
best that can be done without heroic effort is to just print an error 
message to stderr and abort(), which is not terribly different from 
being killed by the OOM killer anyway..

There are a few efforts in the Linux kernel community to do something 
about this that roughly go in a couple slightly different directions:

- Provide some notification to applications that "you're exceeding your 
memory limit, release some memory quickly or face the wrath of the OOM 
killer". See

https://lwn.net/Articles/552789/

https://lwn.net/Articles/548180/

- Provide a mechanism for applications to mark memory ranges as 
"volatile", where the kernel can drop them if memory gets tight instead 
of going on an OOM killer spree.

https://lwn.net/Articles/522135/

https://lwn.net/Articles/554098/


That being said, AFAIK nothing of the above yet exists in the upstream 
kernel today. So for now IMHO the least bad approach is to just limit 
RSS as slurm already does (either with cgroups or by polling), and 
killing jobs if the limit is exceeded.

--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || janne.blomqv...@aalto.fi

[slurm-dev] Re: RLIMIT_DATA effectively a no-op on Linux

Reply via email to