Hi Tom,

I traced the agent of "20160112-165226-67375276-5050-22401-S199" and found
that it is keeps declining by many frameworks: once a framework got it, the
framework will decline it immediately. Does some your framework has special
offer filter logic?

Also I want to get more for your cluster:
1) What is the role for each framework and what is the weight for each role?
2) Do you start all agents without any reservation?



On Sun, Feb 21, 2016 at 9:23 AM, Klaus Ma <klaus1982...@gmail.com> wrote:

> Hi Tom,
> What's the allocation interval, can you try to reduce filter's timeout of
> framework?
> According to the log, ~12 frameworks on cluster with ~42 agents; the
> filter duration is 5sec, and there're ~60 times filtered in each seconds
> (e.g. 65 in 18:08:34). For example, framework 
> (20160219-164457-67375276-5050-28802-0015)
> just get resources from 6 agents and filtered the other 36 agents at
> 18:08:35 (egrep "Alloca|Filtered" mesos-master.log | grep
> "20160219-164457-67375276-5050-28802-0015" | grep "18:08:35")
> Thanks
> Klaus
> ------------------------------
> From: t...@duedil.com
> Subject: Re: Mesos sometimes not allocating the entire cluster
> Date: Sat, 20 Feb 2016 16:36:54 +0000
> To: user@mesos.apache.org
> Hi Guangya,
> Indeed we have about ~45 agents. I’ve attached the log from the master…
> Hope there’s something here that highlights the issue, we can’t find
> anything that we can’t explain.
> Cheers,
> Tom.
> On 19 Feb 2016, at 03:02, Guangya Liu <gyliu...@gmail.com> wrote:
> Hi Tom,
> After the patch was applied, there is no need to restart framework but
> only mesos master.
> One question is that I saw from your log, seems your cluster has at least
> 36 agents, right? I was asking this question because if there are more
> frameworks than agents, frameworks with low weight may not able to get
> resources sometimes.
> Can you please enable GLOG_v=2 for mesos master for a while and put the
> log somewhere for us to check (Do not enable this for a long time as you
> will get log message flooded), this kind of log messages may give some help
> for your problem.
> Another is that there is another problem trying to fix another performance
> issue for allocator but may not help you much, but you can still take a
> look: https://issues.apache.org/jira/browse/MESOS-4694
> Thanks,
> Guangya
> On Fri, Feb 19, 2016 at 2:19 AM, Tom Arnfeld <t...@duedil.com> wrote:
> Hi Ben,
> We've rolled that patch out (applied over 0.23.1) on our production
> cluster and have seen little change, the master is still not sending any
> offers to those frameworks. We did this upgrade online, so would there be
> any reason the fix wouldn't have helped (other than it not being the
> cause)? Would we need to restart the frameworks (so they get new IDs) to
> see the effect?
> It's not that the master is never sending them offers, it's that it does
> it up to a certain point... for different types of frameworks (all using
> libmesos) but then no more, regardless of how much free resource is
> available... the free resources are offered to some frameworks, but not
> all. Is there any way for us to do more introspection into the state of the
> master / allocator to try and debug? Right now we're at a bit of a loss of
> where to start diving in...
> Much appreciated as always,
> Tom.
> On 18 February 2016 at 10:21, Tom Arnfeld <t...@duedil.com> wrote:
> Hi Ben,
> I've only just seen your email! Really appreciate the reply, that's
> certainly an interesting bug and we'll try that patch and see how we get on.
> Cheers,
> Tom.
> On 29 January 2016 at 19:54, Benjamin Mahler <bmah...@apache.org> wrote:
> Hi Tom,
> I suspect you may be tripping the following issue:
> https://issues.apache.org/jira/browse/MESOS-4302
> Please have a read through this and see if it applies here. You may also
> be able to apply the fix to your cluster to see if that helps things.
> Ben
> On Wed, Jan 20, 2016 at 10:19 AM, Tom Arnfeld <t...@duedil.com> wrote:
> Hey,
> I've noticed some interesting behaviour recently when we have lots of
> different frameworks connected to our Mesos cluster at once, all using a
> variety of different shares. Some of the frameworks don't get offered more
> resources (for long periods of time, hours even) leaving the cluster under
> utilised.
> Here's an example state where we see this happen..
> Framework 1 - 13% (user A)
> Framework 2 - 22% (user B)
> Framework 3 - 4% (user C)
> Framework 4 - 0.5% (user C)
> Framework 5 - 1% (user C)
> Framework 6 - 1% (user C)
> Framework 7 - 1% (user C)
> Framework 8 - 0.8% (user C)
> Framework 9 - 11% (user D)
> Framework 10 - 7% (user C)
> Framework 11 - 1% (user C)
> Framework 12 - 1% (user C)
> Framework 13 - 6% (user E)
> In this example, there's another ~30% of the cluster that is unallocated,
> and it stays like this for a significant amount of time until something
> changes, perhaps another user joins and allocates the rest.... chunks of
> this spare resource is offered to some of the frameworks, but not all of
> them.
> I had always assumed that when lots of frameworks were involved,
> eventually the frameworks that would keep accepting resources indefinitely
> would consume the remaining resource, as every other framework had rejected
> the offers.
> Could someone elaborate a little on how the DRF allocator / sorter handles
> this situation, is this likely to be related to the different users being
> used? Is there a way to mitigate this?
> We're running version 0.23.1.
> Cheers,
> Tom.
> --
> Guangya Liu (刘光亚)
> Senior Software Engineer
> DCOS and OpenStack Development
> IBM Platform Computing
> Systems and Technology Group

Guangya Liu (刘光亚)
Senior Software Engineer
DCOS and OpenStack Development
IBM Platform Computing
Systems and Technology Group

Reply via email to