On 04/28/2015 11:54 AM, Dick Davies wrote:
Thanks Ian.

Digging around the cgroup there are 3 processes in there;

* the mesos-executor
* the shell script marathon starts the app with
* the actual command to run the task ( a perl app in this case)

We've been having discussions about various aspects of memory management. It needs to be enhanced both at the mesos-cluster level and the framework (scheduler?) level, right above the myriad of processes that start|idle|stop.


In fact if you look at "hwloc" [1] there is a movement that abstracts
resource classifications, particularly of memory/cache/registers, in such as way as to make sense both in a heterogeneous environment and within arch-processor families that have different resources mixed into the processor chipsset. Furthermore, gcc-5.1 has full support for RDMA and generic access to GPU based resources, so that is further reason to expand the use of cgroups and allow folks running these clusters to directly tune performance via cgroup settings, while a cluster is up and running.


I really hate to be the 'old fashion computer scientist' in this group,
but, I think that the role of and usage of 'cgroups' is going to have to
be expanded greatly as a solution to the dynamic memory management needs of both the cluster(s) and the frameworks. This problem is not going away and I see no other serious solution to cgroup use expansion.


[1] http://www.open-mpi.org/projects/hwloc/


hth,
James




The line of code you mention is never run in our case, because it's
wrapped in the conditional
I'm talking about!

All I see is cpu.shares being set and then mem.soft_limit_in_bytes.


On 28 April 2015 at 17:47, Ian Downes <idow...@twitter.com> wrote:
The line of code you cite is so the hard limit is not decreased on a running
container because we can't (easily) reclaim anonymous memory from running
processes. See the comment above the code.

The info->pid.isNone() is for when cgroup is being configured (see the
update() call at the end of MemIsolatorProcess::prepare()), i.e., before any
processes are added to the cgroup.

The limit > currentLimit.get() ensures the limit is only increased.

The memory limit defaults to the maximum for the data type, I guess that's
the ridiculous 8 EB. It should be set to what the initial memory allocation
was for the container so this is not expected. Can you look in the slave
logs for when the container was created for the log line on:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L393

Ian

On Tue, Apr 28, 2015 at 7:42 AM, Dick Davies <d...@hellooperator.net> wrote:

Been banging my head against this  for a while now.

mesos 0.21.0 , marathon 0.7.5, centos 6 servers.

When I enable cgroups (flags are : --cgroups_limit_swap
--isolation=cgroups/cpu,groups/mem ) the memory limits I'm setting
are reflected in memory.soft_limit_in_bytes but not in

memory.limit_in_bytes or memory.memsw.limit_in_bytes.


Upshot is our runaway task eats all RAM and swap on the server
until the OOM steps in and starts firing into the crowd.

This line of code seems to never lower a hard limit:


https://github.com/apache/mesos/blob/master/src/slave/containerizer/isolators/cgroups/mem.cpp#L382

which means both of those tests must be true, right?

the current limit is insanely high (8192 PB if i'm reading it right) - how
would
I make info->pid.isNone() be true ?

Have tried restarting the slave, scaling the marathon apps to 0 tasks
then back. Bit stumped.




Reply via email to