Hi Gerard,

As others has mentioned I believe you're hitting Mesos-1688, can you
upgrade to the latest Mesos release (0.21.1) and let us know if it resolves
your problem?

Thanks,

Tim

On Tue, Jan 27, 2015 at 10:39 AM, Sam Bessalah <samkiller....@gmail.com>
wrote:

> Hi Geraard,
> isn't this the same issueas this?
> https://issues.apache.org/jira/browse/MESOS-1688
>
> On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas <gerard.m...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We are observing with certain regularity that our Spark  jobs, as Mesos
>> framework, are hoarding resources and not releasing them, resulting in
>> resource starvation to all jobs running on the Mesos cluster.
>>
>> For example:
>> This is a job that has spark.cores.max = 4 and spark.executor.memory="3g"
>>
>> IDFrameworkHostCPUsMem…5050-16506-1146497FooStreamingdnode-4.hdfs.private
>> 713.4 GB…5050-16506-1146495FooStreaming
>> dnode-0.hdfs.private16.4 GB…5050-16506-1146491FooStreaming
>> dnode-5.hdfs.private711.9 GB…5050-16506-1146449FooStreaming
>> dnode-3.hdfs.private74.9 GB…5050-16506-1146247FooStreaming
>> dnode-1.hdfs.private0.55.9 GB…5050-16506-1146226FooStreaming
>> dnode-2.hdfs.private37.9 GB…5050-16506-1144069FooStreaming
>> dnode-3.hdfs.private18.7 GB…5050-16506-1133091FooStreaming
>> dnode-5.hdfs.private11.7 GB…5050-16506-1133090FooStreaming
>> dnode-2.hdfs.private55.2 GB…5050-16506-1133089FooStreaming
>> dnode-1.hdfs.private6.56.3 GB…5050-16506-1133088FooStreaming
>> dnode-4.hdfs.private1251 MB…5050-16506-1133087FooStreaming
>> dnode-0.hdfs.private6.46.8 GB
>> The only way to release the resources is by manually finding the process
>> in the cluster and killing it. The jobs are often streaming but also batch
>> jobs show this behavior. We have more streaming jobs than batch, so stats
>> are biased.
>> Any ideas of what's up here? Hopefully some very bad ugly bug that has
>> been fixed already and that will urge us to upgrade our infra?
>>
>> Mesos 0.20 +  Marathon 0.7.4 + Spark 1.1.0
>>
>> -kr, Gerard.
>>
>
>

Reply via email to