Re: Mesos sometimes not allocating the entire cluster

Klaus Ma Mon, 22 Feb 2016 18:33:05 -0800

@Tom, one more question: how about your task run time? If the task run time
is too short, e.g. 100ms, the resources will be return to allocator when
task finished and will allocate it until next allocation cycle.


----
Da (Klaus), Ma (马达) | PMP® | Advisory Software Engineer
Platform OpenSource Technology, STG, IBM GCG
+86-10-8245 4084 | klaus1982...@gmail.com | http://k82.me

On Tue, Feb 23, 2016 at 10:25 AM, Guangya Liu <gyliu...@gmail.com> wrote:

> Hi Tom,
>
> I saw that the two frameworks with roles is consuming most of the
> resources, so I think that you can do more test by removing the two
> frameworks with roles.
>
> Another I want to mention is that the DRF allocator may have some issues
> when there are plenty of frameworks and the community is trying to improve
> this by some projects, such as 'Optimistic Offer MESOS-1607', 'Quota
> Enhancement MESOS-1791' etc.
>
> The issues for allocator include the following etc:
> https://issues.apache.org/jira/browse/MESOS-4302
> https://issues.apache.org/jira/browse/MESOS-3202 << You may take a look
> at this one in detail.
> https://issues.apache.org/jira/browse/MESOS-3078
>
> Hope this helps.
>
> Thanks,
>
> Guangya
>
>
> On Tue, Feb 23, 2016 at 1:53 AM, Tom Arnfeld <t...@duedil.com> wrote:
>
>> Hi Guangya,
>>
>> Most of the agents do not have a role, so they use the default wildcard
>> role for resources. Also none of the frameworks have a role, therefore they
>> fall into the wildcard role too.
>>
>> Frameworks are being offered resources *up to a certain level of
>> fairness* but no further. The issue appears to be inside the allocator,
>> relating to how it is deciding how many resources each framework should get
>> within the role (wildcard ‘*') in relation to fairness.
>>
>> We seem to have circumvented the problem in the allocator by creating two 
>> *completely
>> new* roles and putting *one framework in each*. No agents have this role
>> assigned to any resources, but by doing this we seem to have got around the
>> bug in the allocator that’s causing strange fairness allocations, resulting
>> in no offers being sent.
>>
>> I’m going to look into defining a reproducible test case for this
>> scheduling situation to coax the allocator into behaving this way in a test
>> environment.
>>
>> Tom.
>>
>> On 22 Feb 2016, at 15:39, Guangya Liu <gyliu...@gmail.com> wrote:
>>
>> If non of the framework has role, then no framework can consume reserved
>> resources, so I think that at least the framework
>> 20160219-164457-67375276-5050-28802-0014 and
>> 20160219-164457-67375276-5050-28802-0015 should have role.
>>
>> Can you please show some detail for the following:
>> 1) Master start command or master http endpoint for flags.
>> 2) All slave start command or slave http endpoint for flags
>> 3) master http endpoint for state
>>
>> Thanks,
>>
>> Guangya
>>
>> On Mon, Feb 22, 2016 at 10:57 PM, Tom Arnfeld <t...@duedil.com> wrote:
>>
>>> Ah yes sorry my mistake, there are a couple of agents with a *dev* role
>>> and only one or two frameworks connect to the cluster with that role, but
>>> not very often. Whether they’re connected or not doesn’t seem to cause any
>>> change in allocation behaviour.
>>>
>>> No other agents have roles.
>>>
>>> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
>>> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
>>> 20160112-174949-84152492-5050-19807-S316 to framework
>>> 20160219-164457-67375276-5050-28802-0014
>>>
>>> This agent should have another 9.5 cpus reserved by some role and no
>>> framework is configured using resources from this role, thus the resources
>>> on this role are wasting.  I think that the following agent may also have
>>> some reserved resources configured:
>>> 20160112-174949-84152492-5050-19807-S317,
>>> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>>>
>>>
>>> I don’t think that’s correct, this is likely to be an offer for a slave
>>> where 9CPUs are currently allocated to an executor.
>>>
>>> I can verify via the agent configuration and HTTP endpoints that most of
>>> the agents do not have a role, and none of the frameworks do.
>>>
>>> Tom.
>>>
>>> On 22 Feb 2016, at 14:09, Guangya Liu <gyliu...@gmail.com> wrote:
>>>
>>> Hi Tom,
>>>
>>> I think that your cluster should have some role, weight configuration
>>> because I can see there are at least two agent has role with "dev"
>>> configured.
>>>
>>> 56 1363 I0219 18:08:26.284010 28810 hierarchical.hpp:1025] Filtered
>>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>>> slave 20160112-165226-67375276-5050-22401-S300 for framework
>>> 20160219-164457-67375276-5050-28802-0015
>>> 57 1364 I0219 18:08:26.284162 28810 hierarchical.hpp:941] Allocating
>>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>>> slave 20160112-165226-67375276-5050-22401-S300 to framework
>>> 20160219-164457-67375276-5050-28802-0014
>>> 58 1365 I0219 18:08:26.286725 28810 hierarchical.hpp:1025] Filtered
>>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>>> slave 20160112-165226-67375276-5050-22401-S303 for framework
>>> 20160219-164457-67375276-5050-28802-0015
>>> 59 1366 I0219 18:08:26.286875 28810 hierarchical.hpp:941] Allocating
>>> ports(dev):[3000-5000]; cpus(dev):10; mem(dev):63488; disk(dev):153600 on
>>> slave 20160112-165226-67375276-5050-22401-S303 to framework
>>> 20160219-164457-67375276-5050-28802-0014
>>>
>>> Also I think that the framework 20160219-164457-67375276-5050-28802-0014
>>> and 20160219-164457-67375276-5050-28802-0015 may have a high weight cause I
>>> saw that framework  20160219-164457-67375276-5050-28802-0014 get 26 agents
>>> at 18:08:26.
>>>
>>> Another is that some other agents may also have role configured but no
>>> frameworks are configured with the agent role and this caused some agents
>>> have some static reserved resources cannot be allocated.
>>>
>>> I searched 20160112-174949-84152492-5050-19807-S316 in the log and found
>>> that it was allocating the following resources to frameworks:
>>>
>>> 974 2420 I0219 18:08:37.504587 28808 hierarchical.hpp:941] Allocating
>>> ports(*):[3000-5000]; cpus(*):0.5; mem(*):16384; disk(*):51200 on slave
>>> 20160112-174949-84152492-5050-19807-S316 to framework
>>> 20160219-164457-67375276-5050-28802-0014
>>>
>>> This agent should have another 9.5 cpus reserved by some role and no
>>> framework is configured using resources from this role, thus the resources
>>> on this role are wasting.  I think that the following agent may also have
>>> some reserved resources configured:
>>> 20160112-174949-84152492-5050-19807-S317,
>>> 20160112-174949-84152492-5050-19807-S322 and even more agents.
>>>
>>> So I would suggest that you check the master and each slave start
>>> command to see how does role configured. You can also check this via the
>>> command: < curl "http://master-ip:5050/master/state.json"; 2>/dev/null|
>>> jq . >  (Note: There is a dot in the end of the command) to get all
>>> slave resources status: reserved, used, total resources etc.
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>>
>>> On Mon, Feb 22, 2016 at 5:16 PM, Tom Arnfeld <t...@duedil.com> wrote:
>>>
>>>> No roles, no reservations.
>>>>
>>>> We're using the default filter options with all frameworks and default
>>>> allocation interval.
>>>>
>>>> On 21 Feb 2016, at 08:10, Guangya Liu <gyliu...@gmail.com> wrote:
>>>>
>>>> Hi Tom,
>>>>
>>>> I traced the agent of "20160112-165226-67375276-5050-22401-S199" and
>>>> found that it is keeps declining by many frameworks: once a framework got
>>>> it, the framework will decline it immediately. Does some your framework has
>>>> special offer filter logic?
>>>>
>>>> Also I want to get more for your cluster:
>>>> 1) What is the role for each framework and what is the weight for each
>>>> role?
>>>> 2) Do you start all agents without any reservation?
>>>>
>>>> Thanks,
>>>>
>>>> Guangya
>>>>
>>>> On Sun, Feb 21, 2016 at 9:23 AM, Klaus Ma <klaus1982...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Tom,
>>>>>
>>>>> What's the allocation interval, can you try to reduce filter's timeout
>>>>> of framework?
>>>>>
>>>>> According to the log, ~12 frameworks on cluster with ~42 agents; the
>>>>> filter duration is 5sec, and there're ~60 times filtered in each seconds
>>>>> (e.g. 65 in 18:08:34). For example, framework 
>>>>> (20160219-164457-67375276-5050-28802-0015)
>>>>> just get resources from 6 agents and filtered the other 36 agents at
>>>>> 18:08:35 (egrep "Alloca|Filtered" mesos-master.log | grep
>>>>> "20160219-164457-67375276-5050-28802-0015" | grep "18:08:35")
>>>>>
>>>>> Thanks
>>>>> Klaus
>>>>>
>>>>> ------------------------------
>>>>> From: t...@duedil.com
>>>>> Subject: Re: Mesos sometimes not allocating the entire cluster
>>>>> Date: Sat, 20 Feb 2016 16:36:54 +0000
>>>>> To: user@mesos.apache.org
>>>>>
>>>>> Hi Guangya,
>>>>>
>>>>> Indeed we have about ~45 agents. I’ve attached the log from the master…
>>>>>
>>>>>
>>>>>
>>>>> Hope there’s something here that highlights the issue, we can’t find
>>>>> anything that we can’t explain.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Tom.
>>>>>
>>>>> On 19 Feb 2016, at 03:02, Guangya Liu <gyliu...@gmail.com> wrote:
>>>>>
>>>>> Hi Tom,
>>>>>
>>>>> After the patch was applied, there is no need to restart framework but
>>>>> only mesos master.
>>>>>
>>>>> One question is that I saw from your log, seems your cluster has at
>>>>> least 36 agents, right? I was asking this question because if there are
>>>>> more frameworks than agents, frameworks with low weight may not able to 
>>>>> get
>>>>> resources sometimes.
>>>>>
>>>>> Can you please enable GLOG_v=2 for mesos master for a while and put
>>>>> the log somewhere for us to check (Do not enable this for a long time as
>>>>> you will get log message flooded), this kind of log messages may give some
>>>>> help for your problem.
>>>>>
>>>>> Another is that there is another problem trying to fix another
>>>>> performance issue for allocator but may not help you much, but you can
>>>>> still take a look: https://issues.apache.org/jira/browse/MESOS-4694
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Guangya
>>>>>
>>>>> On Fri, Feb 19, 2016 at 2:19 AM, Tom Arnfeld <t...@duedil.com> wrote:
>>>>>
>>>>> Hi Ben,
>>>>>
>>>>> We've rolled that patch out (applied over 0.23.1) on our production
>>>>> cluster and have seen little change, the master is still not sending any
>>>>> offers to those frameworks. We did this upgrade online, so would there be
>>>>> any reason the fix wouldn't have helped (other than it not being the
>>>>> cause)? Would we need to restart the frameworks (so they get new IDs) to
>>>>> see the effect?
>>>>>
>>>>> It's not that the master is never sending them offers, it's that it
>>>>> does it up to a certain point... for different types of frameworks (all
>>>>> using libmesos) but then no more, regardless of how much free resource is
>>>>> available... the free resources are offered to some frameworks, but not
>>>>> all. Is there any way for us to do more introspection into the state of 
>>>>> the
>>>>> master / allocator to try and debug? Right now we're at a bit of a loss of
>>>>> where to start diving in...
>>>>>
>>>>> Much appreciated as always,
>>>>>
>>>>> Tom.
>>>>>
>>>>> On 18 February 2016 at 10:21, Tom Arnfeld <t...@duedil.com> wrote:
>>>>>
>>>>> Hi Ben,
>>>>>
>>>>> I've only just seen your email! Really appreciate the reply, that's
>>>>> certainly an interesting bug and we'll try that patch and see how we get 
>>>>> on.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Tom.
>>>>>
>>>>> On 29 January 2016 at 19:54, Benjamin Mahler <bmah...@apache.org>
>>>>> wrote:
>>>>>
>>>>> Hi Tom,
>>>>>
>>>>> I suspect you may be tripping the following issue:
>>>>> https://issues.apache.org/jira/browse/MESOS-4302
>>>>>
>>>>> Please have a read through this and see if it applies here. You may
>>>>> also be able to apply the fix to your cluster to see if that helps things.
>>>>>
>>>>> Ben
>>>>>
>>>>> On Wed, Jan 20, 2016 at 10:19 AM, Tom Arnfeld <t...@duedil.com> wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> I've noticed some interesting behaviour recently when we have lots of
>>>>> different frameworks connected to our Mesos cluster at once, all using a
>>>>> variety of different shares. Some of the frameworks don't get offered more
>>>>> resources (for long periods of time, hours even) leaving the cluster under
>>>>> utilised.
>>>>>
>>>>> Here's an example state where we see this happen..
>>>>>
>>>>> Framework 1 - 13% (user A)
>>>>> Framework 2 - 22% (user B)
>>>>> Framework 3 - 4% (user C)
>>>>> Framework 4 - 0.5% (user C)
>>>>> Framework 5 - 1% (user C)
>>>>> Framework 6 - 1% (user C)
>>>>> Framework 7 - 1% (user C)
>>>>> Framework 8 - 0.8% (user C)
>>>>> Framework 9 - 11% (user D)
>>>>> Framework 10 - 7% (user C)
>>>>> Framework 11 - 1% (user C)
>>>>> Framework 12 - 1% (user C)
>>>>> Framework 13 - 6% (user E)
>>>>>
>>>>> In this example, there's another ~30% of the cluster that is
>>>>> unallocated, and it stays like this for a significant amount of time until
>>>>> something changes, perhaps another user joins and allocates the rest....
>>>>> chunks of this spare resource is offered to some of the frameworks, but 
>>>>> not
>>>>> all of them.
>>>>>
>>>>> I had always assumed that when lots of frameworks were involved,
>>>>> eventually the frameworks that would keep accepting resources indefinitely
>>>>> would consume the remaining resource, as every other framework had 
>>>>> rejected
>>>>> the offers.
>>>>>
>>>>> Could someone elaborate a little on how the DRF allocator / sorter
>>>>> handles this situation, is this likely to be related to the different 
>>>>> users
>>>>> being used? Is there a way to mitigate this?
>>>>>
>>>>> We're running version 0.23.1.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Tom.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Guangya Liu (刘光亚)
>>>>> Senior Software Engineer
>>>>> DCOS and OpenStack Development
>>>>> IBM Platform Computing
>>>>> Systems and Technology Group
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Guangya Liu (刘光亚)
>>>> Senior Software Engineer
>>>> DCOS and OpenStack Development
>>>> IBM Platform Computing
>>>> Systems and Technology Group
>>>>
>>>>
>>>
>>>
>>> --
>>> Guangya Liu (刘光亚)
>>> Senior Software Engineer
>>> DCOS and OpenStack Development
>>> IBM Platform Computing
>>> Systems and Technology Group
>>>
>>>
>>>
>>
>>
>> --
>> Guangya Liu (刘光亚)
>> Senior Software Engineer
>> DCOS and OpenStack Development
>> IBM Platform Computing
>> Systems and Technology Group
>>
>>
>>
>

Re: Mesos sometimes not allocating the entire cluster

Reply via email to