Re: tasks won't run on mesos when using fine grained

Gary Ogden Tue, 16 Jun 2015 05:32:40 -0700

On the master node, I see this printed over and over in the
mesos-master.WARNING log file:
W0615 06:06:51.211262  8672 hierarchical_allocator_process.hpp:589] Using
the default value of 'refuse_seconds' to create the refused resources
filter because the input value is negative

Here's what I see in the master INFO file:
I0616 12:10:55.040024  8674 http.cpp:478] HTTP request for
'/master/state.json'
I0616 12:10:55.425833  8669 master.cpp:3843] Sending 1 offers to framework
20150511-140547-189138442-5051-8667-0831 (Savings) at
[email protected]:47979
I0616 12:10:55.438303  8669 master.cpp:3843] Sending 1 offers to framework
20150304-134212-222692874-5051-2300-0054
(chronos-2.3.2_mesos-0.20.1-SNAPSHOT) at
[email protected]:57549
I0616 12:10:55.441295  8669 master.cpp:3843] Sending 1 offers to framework
20150511-140547-189138442-5051-8667-0838 (Savings) at
[email protected]:53346
I0616 12:10:55.442204  8669 master.cpp:2344] Processing reply for offers: [
20150511-140547-189138442-5051-8667-O9282037 ] on slave
20150511-140547-189138442-5051-8667-S4 at slave(1)@10.6.71.203:5151
(secasdb01-2) for framework 20150511-140547-189138442-5051-8667-0831
(Savings) at
[email protected]:47979
I0616 12:10:55.443111  8669 master.cpp:2344] Processing reply for offers: [
20150511-140547-189138442-5051-8667-O9282038 ] on slave
20150304-134111-205915658-5051-1595-S0 at slave(1)@10.6.71.206:5151
(secasdb01-3) for framework 20150304-134212-222692874-5051-2300-0054
(chronos-2.3.2_mesos-0.20.1-SNAPSHOT) at
[email protected]:57549
I0616 12:10:55.444875  8671 hierarchical_allocator_process.hpp:563]
Recovered mem(*):5305; disk(*):4744; ports(*):[25001-30000] (total
allocatable: mem(*):5305; disk(*):4744; ports(*):[25001-30000]) on slave
20150511-140547-189138442-5051-8667-S4 from framework
20150511-140547-189138442-5051-8667-0831
I0616 12:10:55.445121  8669 master.cpp:2344] Processing reply for offers: [
20150511-140547-189138442-5051-8667-O9282039 ] on slave
20150511-140547-189138442-5051-8667-S2 at slave(1)@10.6.71.202:5151
(secasdb01-1) for framework 20150511-140547-189138442-5051-8667-0838
(Savings) at
[email protected]:53346
I0616 12:10:55.445971  8670 hierarchical_allocator_process.hpp:563]
Recovered mem(*):6329; disk(*):5000; ports(*):[25001-30000] (total
allocatable: mem(*):6329; disk(*):5000; ports(*):[25001-30000]) on slave
20150304-134111-205915658-5051-1595-S0 from framework
20150304-134212-222692874-5051-2300-0054
I0616 12:10:55.446185  8674 hierarchical_allocator_process.hpp:563]
Recovered mem(*):4672; disk(*):4488; ports(*):[25001-25667, 25669-30000]
(total allocatable: mem(*):4672; disk(*):4488; ports(*):[25001-25667,
25669-30000]) on slave 20150511-140547-189138442-5051-8667-S2 from
framework 20150511-140547-189138442-5051-8667-0838

There's two savings jobs and one weather job and they're all hung right now
(all started from chronos).

Here's what the frameworks tab looks like in mesos:
IDHostUserNameActive TasksCPUsMemMax ShareRegisteredRe-Registered
…5051-8667-0840
<http://intmesosmaster01:5051/#/frameworks/20150511-140547-189138442-5051-8667-0840>
secasdb01-1mesosWeather000 B0%4 hours ago-…5051-8667-0838
<http://intmesosmaster01:5051/#/frameworks/20150511-140547-189138442-5051-8667-0838>
secasdb01-1mesosSavings000 B0%4 hours ago-…5051-8667-0831
<http://intmesosmaster01:5051/#/frameworks/20150511-140547-189138442-5051-8667-0831>
secasdb01-2mesosSavings000 B0%7 hours ago-…5051-8667-0804
<http://intmesosmaster01:5051/#/frameworks/20150511-140547-189138442-5051-8667-0804>
secasdb01-1mesosAlertConsumer131.0 GB50%20 hours ago-…5051-2300-0090
<http://intmesosmaster01:5051/#/frameworks/20150304-134212-222692874-5051-2300-0090>
intMesosMaster02
mesosmarathon10.5128 MB8.333%a month agoa month ago…5051-2300-0054
<http://intmesosmaster01:5051/#/frameworks/20150304-134212-222692874-5051-2300-0054>
intMesosMaster01rootchronos-2.3.2_mesos-0.20.1-SNAPSHOT32.53.0 GB41.667%a
month agoa month ago
It seems that the chronos framework has reserved all the remaining cpu in
the cluster but not given it to the jobs that need it (savings and
weather).

AlertConsumer is a marathon job that's always running and is working fine.

On 16 June 2015 at 04:32, Akhil Das <[email protected]> wrote:

> Did you look inside all logs? Mesos logs and executor logs?
>
> Thanks
> Best Regards
>
> On Mon, Jun 15, 2015 at 7:09 PM, Gary Ogden <[email protected]> wrote:
>
>> My Mesos cluster has 1.5 CPU and 17GB free.  If I set:
>>
>> conf.set("spark.mesos.coarse", "true");
>> conf.set("spark.cores.max", "1");
>>
>> in the SparkConf object, the job will run in the mesos cluster fine.
>>
>> But if I comment out those settings above so that it defaults to fine
>> grained, the task never finishes. It just shows as 0 for everything in the
>> mesos frameworks (# of tasks, cpu, memory are all 0).  There's nothing in
>> the log files anywhere as to what's going on.
>>
>> Thanks
>>
>>
>>
>>
>

Re: tasks won't run on mesos when using fine grained

Reply via email to