Problems with OOM

2014-09-26 Thread Stephan Erb
Hi everyone, I am having issues with the cgroups isolation of Mesos. It seems like tasks are prevented from allocating more memory than their limit. However, they are never killed. * My scheduled task allocates memory in a tight loop. According to 'ps', once its memory requirements are ex

Re: Problems with OOM

2014-09-26 Thread Tomas Barton
Just to make sure, all slaves are running with: --isolation='cgroups/cpu,cgroups/mem' Is there something suspicious in mesos slave logs? On 26 September 2014 13:20, Stephan Erb wrote: > Hi everyone, > > I am having issues with the cgroups isolation of Mesos. It seems like > tasks are prevente

Re: Problems with OOM

2014-09-26 Thread Tom Arnfeld
I'm not sure if this at all related to the issue you're seeing, but we ran into this fun issue (or at least this seems to be the cause) helpfully documented on this blog article: http://blog.nitrous.io/2014/03/10/stability-and-a-linux-oom-killer-bug.html. TLDR: OOM killer getting into an infinite

Re: Problems with OOM

2014-09-26 Thread Stephan Erb
@Tomas: I am currently only running a single slave in a VM. It uses the isolator and the logs are clean. @Tom: Thanks for the interesting hint! I will look into it. Best Regards, Stephan On Fr 26 Sep 2014 16:53:22 CEST, Tom Arnfeld wrote: I'm not sure if this at all related to the issue you're

Re: Problems with OOM

2014-09-27 Thread CCAAT
Hello one and all, From my research, the most significant point to using mesos, is to use "container" in lieu of a VM configuration [1]. I'd be curious as to informative points that illuminate this issue. I guess the main point is that for mesos to "be all it can be" were talking about "container

Re: Problems with OOM

2014-09-27 Thread CCAAT
On 09/26/14 06:20, Stephan Erb wrote: Hi everyone, I am having issues with the cgroups isolation of Mesos. It seems like tasks are prevented from allocating more memory than their limit. However, they are never killed. I am running Aurora and Mesos 0.20.1 using the cgroups isolation on Debian

Re: Problems with OOM

2014-10-06 Thread Stephan Erb
Hello, I am still facing the same issue: * My process keeps allocating memory until all available system memory is used, but it is never killed. Its sandbox is limited to x00 MB but it ends up using several GB. * There is no OOM or cgroup related entry in dmesg (beside the initializat

Re: Problems with OOM

2014-10-07 Thread Stephan Erb
Ok, here is something odd. My kernel is booted using "cgroup_enable=memory swapaccount=1" in order to enable cgroup accounting. The log for starting a new container: I1007 11:38:25.881882 3698 slave.cpp:1222] Queuing task '1412674695525-www-data-test-ipython-1-1ecf0bba-6989-4b5c-b800-717914b5

Re: Problems with OOM

2014-10-07 Thread Stephan Erb
Seems like there is a workaround: I can emulate my desired configuration to prevent swap usage, by disabling swap on the host and starting the slave without "--cgroups_limit_swap". Then everything works as expected, i.e., a misbehaving task is killed immediately. However, I still don't know wh

Re: Problems with OOM

2014-10-07 Thread CCAAT
On 10/07/14 06:50, Stephan Erb wrote: Seems like there is a workaround: I can emulate my desired configuration to prevent swap usage, by disabling swap on the host and starting the slave without "--cgroups_limit_swap". Then everything works as expected, i.e., a misbehaving task is killed immediat

Re: Problems with OOM

2014-10-07 Thread CCAAT
On 10/07/14 06:50, Stephan Erb wrote: Seems like there is a workaround: I can emulate my desired configuration to prevent swap usage, by disabling swap on the host and starting the slave without "--cgroups_limit_swap". Then everything works as expected, i.e., a misbehaving task is killed immediat