Hi Gerard, As others has mentioned I believe you're hitting Mesos-1688, can you upgrade to the latest Mesos release (0.21.1) and let us know if it resolves your problem?
Thanks, Tim On Tue, Jan 27, 2015 at 10:39 AM, Sam Bessalah <samkiller....@gmail.com> wrote: > Hi Geraard, > isn't this the same issueas this? > https://issues.apache.org/jira/browse/MESOS-1688 > > On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas <gerard.m...@gmail.com> > wrote: > >> Hi, >> >> We are observing with certain regularity that our Spark jobs, as Mesos >> framework, are hoarding resources and not releasing them, resulting in >> resource starvation to all jobs running on the Mesos cluster. >> >> For example: >> This is a job that has spark.cores.max = 4 and spark.executor.memory="3g" >> >> IDFrameworkHostCPUsMem…5050-16506-1146497FooStreamingdnode-4.hdfs.private >> 713.4 GB…5050-16506-1146495FooStreaming >> dnode-0.hdfs.private16.4 GB…5050-16506-1146491FooStreaming >> dnode-5.hdfs.private711.9 GB…5050-16506-1146449FooStreaming >> dnode-3.hdfs.private74.9 GB…5050-16506-1146247FooStreaming >> dnode-1.hdfs.private0.55.9 GB…5050-16506-1146226FooStreaming >> dnode-2.hdfs.private37.9 GB…5050-16506-1144069FooStreaming >> dnode-3.hdfs.private18.7 GB…5050-16506-1133091FooStreaming >> dnode-5.hdfs.private11.7 GB…5050-16506-1133090FooStreaming >> dnode-2.hdfs.private55.2 GB…5050-16506-1133089FooStreaming >> dnode-1.hdfs.private6.56.3 GB…5050-16506-1133088FooStreaming >> dnode-4.hdfs.private1251 MB…5050-16506-1133087FooStreaming >> dnode-0.hdfs.private6.46.8 GB >> The only way to release the resources is by manually finding the process >> in the cluster and killing it. The jobs are often streaming but also batch >> jobs show this behavior. We have more streaming jobs than batch, so stats >> are biased. >> Any ideas of what's up here? Hopefully some very bad ugly bug that has >> been fixed already and that will urge us to upgrade our infra? >> >> Mesos 0.20 + Marathon 0.7.4 + Spark 1.1.0 >> >> -kr, Gerard. >> > >