Re: group memory limits are always 'soft' . how do I ensure info->pid.isNone() ?

CCAAT Tue, 28 Apr 2015 10:25:07 -0700

On 04/28/2015 11:54 AM, Dick Davies wrote:

Thanks Ian.


Digging around the cgroup there are 3 processes in there;

* the mesos-executor
* the shell script marathon starts the app with
* the actual command to run the task ( a perl app in this case)

We've been having discussions about various aspects of memorymanagement. It needs to be enhanced both at the mesos-cluster level andthe framework (scheduler?) level, right above the myriad of processesthat start|idle|stop.



In fact if you look at "hwloc" [1] there is a movement that abstracts

resource classifications, particularly of memory/cache/registers, insuch as way as to make sense both in a heterogeneous environment andwithin arch-processor families that have different resources mixed intothe processor chipsset. Furthermore, gcc-5.1 has full support for RDMAand generic access to GPU based resources, so that is further reason toexpand the use of cgroups and allow folks running these clusters todirectly tune performance via cgroup settings, while a cluster is up andrunning.



I really hate to be the 'old fashion computer scientist' in this group,
but, I think that the role of and usage of 'cgroups' is going to have to

be expanded greatly as a solution to the dynamic memory management needsof both the cluster(s) and the frameworks. This problem is not goingaway and I see no other serious solution to cgroup use expansion.



[1] http://www.open-mpi.org/projects/hwloc/


hth,
James


The line of code you mention is never run in our case, because it's
wrapped in the conditional
I'm talking about!

All I see is cpu.shares being set and then mem.soft_limit_in_bytes.


On 28 April 2015 at 17:47, Ian Downes <idow...@twitter.com> wrote:

The line of code you cite is so the hard limit is not decreased on a running
container because we can't (easily) reclaim anonymous memory from running
processes. See the comment above the code.

The info->pid.isNone() is for when cgroup is being configured (see the
update() call at the end of MemIsolatorProcess::prepare()), i.e., before any
processes are added to the cgroup.

The limit > currentLimit.get() ensures the limit is only increased.

The memory limit defaults to the maximum for the data type, I guess that's
the ridiculous 8 EB. It should be set to what the initial memory allocation
was for the container so this is not expected. Can you look in the slave
logs for when the container was created for the log line on:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393

Ian

On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies <d...@hellooperator.net> wrote:


Been banging my head against this  for a while now.

mesos 0.21.0 , marathon 0.7.5, centos 6 servers.

When I enable cgroups (flags are : --cgroups_limit_swap
--isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting
are reflected in memory.soft_limit_in_bytes but not in

memory.limit_in_bytes or memory.memsw.limit_in_bytes.


Upshot is our runaway task eats all RAM and swap on the server
until the OOM steps in and starts firing into the crowd.

This line of code seems to never lower a hard limit:


https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L382

which means both of those tests must be true, right?

the current limit is insanely high (8192 PB if i'm reading it right) - how
would
I make info->pid.isNone() be true ?

Have tried restarting the slave, scaling the marathon apps to 0 tasks
then back. Bit stumped.

Re: group memory limits are always 'soft' . how do I ensure info->pid.isNone() ?

Reply via email to