Just to make sure, all slaves are running with: --isolation='cgroups/cpu,cgroups/mem'
Is there something suspicious in mesos slave logs? On 26 September 2014 13:20, Stephan Erb <stephan....@blue-yonder.com> wrote: > Hi everyone, > > I am having issues with the cgroups isolation of Mesos. It seems like > tasks are prevented from allocating more memory than their limit. However, > they are never killed. > > - My scheduled task allocates memory in a tight loop. According to > 'ps', once its memory requirements are exceeded it is not killed, but ends > up in the state D ("uninterruptible sleep (usually IO)"). > - The task is still considered running by Mesos. > - There is no indication of an OOM in dmesg. > - There is neither an OOM notice nor any other output related to the > task in the slave log. > - According to htop, the system load is increased with a significant > portion of CPU time spend within the kernel. Commonly the load is so high > that all zookeeper connections time out. > > I am running Aurora and Mesos 0.20.1 using the cgroups isolation on Debian > 7 (kernel 3.2.60-1+deb7u3). . > > Sorry for the somewhat unspecific error description. Still, anyone an idea > what might be wrong here? > > Thanks and Best Regards, > Stephan >