Re: Rack awareness support for Mesos

Joris Van Remoortere Thu, 16 Jun 2016 17:01:11 -0700

@Fan,

In the community meeting a question was raised around which frameworks
might be ready to use this.
Can you provide some more context for immediate use cases on the framework
side?


—
*Joris Van Remoortere*
Mesosphere

On Wed, Jun 15, 2016 at 5:04 PM, james <gar...@verizon.net> wrote:

> @Joris,
>
>
> OK. Now I understand where you are coming from. As soon as I get some
> time, I'll join that design discussion. Thanks for the clarifications.
>
> James
>
>
>
>
>
> On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:
>
>>         Since your interest is in the determination of the values, as
>>         opposed to
>>
>>         their propagation, I would just urge that you keep in mind that
>>         we may
>>
>>         (as a project) not want to support this information as the current
>>
>>         string attributes.
>>
>>
>>     Huh? Why not? If the attributes change, why can't this sub-project
>>     just change with those changing string attributes? Maybe some
>>     elaboration how this might not naturally be able to evolve is a
>>     warranted detail of discussion?
>>
>>
>> Sorry, I should clarify what I meant by support. By support I mean that
>> we may not want to promise that those values will be there (support as a
>> feature), and what schemas are mangled into the random strings that we
>> currently call attributes. I did not mean that we wouldn't allow users
>> to inject their own values if they wanted to. We just wouldn't control
>> the standard or schema as a project and therefore couldn't support it.
>>
>> Any random collection of strings that has previously had no reserved
>> keywords is notoriously difficult to build new schemas in.
>> This is why we may want to instead introduce a typed structure that is
>> dedicated to fault domain information. This:
>>
>>   * Prevents us from colliding with current users' attributes.
>>   * Allows us to have more control over the types (YAY) and ranges of
>>     values.
>>   * Allows us to introduce explicit structure such as dependency or
>>     hierarchy.
>>
>> The fact that users have already encoded information in attributes is
>> not a reason for us to limit ourselves to that scope when better
>> structures may be available. This is why we shouldn't assume that the
>> project will *provide support for* (as opposed to allow users to) using
>> attributes.
>>
>> As your said, it is their prerogative to join the design discussion to
>> ensure that any formalized structure or schema we introduce is one that
>> they are agreeable with.
>>
>>
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 6:31 PM, james <gar...@verizon.net
>> <mailto:gar...@verizon.net>> wrote:
>>
>>     On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>>
>>             On the condition of compatible with existing framework which
>>             already rely on parsing attributes for rack information.
>>
>>         There is currently nothing in Mesos that specifies the format or
>>         structure for rack information in attributes.
>>         The fact that operators / frameworks have decided to add this
>>         information out of band is their problem to solve.
>>         We don't need to be backwards compatible with something we never
>>         published to begin with. This is why it's ok for us to consider
>>         adding a
>>         typed form of failure domain information that is separate from the
>>         typeless string attributes.
>>
>>
>>     True. But you have to start somewhere, know that the schema and
>>     codes will morph over time to maintain relevance  and usefulness. In
>>     that vein, if folks have established interesting and useful
>>     parameters for this work, then it is most beneficial that those
>>     methods and codes are considered carefully.  AKA:: speak up now.
>>     Diversity and inclusion are keenly beneficial, where practical.
>>
>>
>>         Since your interest is in the determination of the values, as
>>         opposed to
>>         their propagation, I would just urge that you keep in mind that
>>         we may
>>         (as a project) not want to support this information as the current
>>         string attributes.
>>
>>
>>     Huh? Why not? If the attributes change, why can't this sub-project
>>     just change with those changing string attributes? Maybe some
>>     elaboration how this might not naturally be able to evolve is a
>>     warranted detail of discussion?
>>
>>
>>     I would venture that both 'determination of the values and
>>     propagation (delays)' are inherently important in a cluster of many
>>     things:: hardware, resources, frameworks, security codes, etc etc.
>>     The author
>>     and others seem to be keenly aware that a tight focus is not going
>>     to work, at this stage, so a broad appeal to a multitude of needs is
>>     best.
>>     And in fact, until some idea is proven to be useless or too difficult
>> to
>>     implement, the bigger the tent, the more useful the codes that
>>     define this project/idea become.  Personally, I'm very excited that
>>     someone has stepped up in this area; hoping they keep an open mind
>>     and flexibility geared toward multiplicative usage, in the future.
>>     Most mature hardware folks who build ideas into robust systems do
>>     exactly that, to motivate a multiplicative usage for organizing
>>     hardware, performance and state metrics, and timing signals,
>>     gregariously. All of this is routine semantics from a hardware
>>     perspective.
>>
>>     At some point, folks will realize that kernel configuration, testing
>>     and tweaks are critical to cluster performance, regardless of the
>> codes
>>     running on top of the cluster. So this project could easily use
>> cgroups
>>     and such for achieve robustness in many areas of need.
>>
>>
>>     Like it or not large amounts of hardware, need to have schema,
>>     planning and architectural robustness to keep large amounts of
>>     hardware, pristinely  available for software efficiency to be any
>>     where near optimal deployment. This really becomes critical when the
>>     mix of different CPU types, GPUs and ram are to be considered in
>>     future deployments, regardless if you outsource or run your own
>>     cluster. Hardware vendors are going to want to sell their products
>>     to as wide of a customer base a possible and customers are going to
>>     demand seamless management for expansion of resources. Furthermore,
>>     as a consultant my experiences are that much of the future market is
>>     going to demand outsourced, hybrid and in-house options as a
>>     fundamental tenant of cluster resource adoption.
>>
>>     hth,
>>     James
>>
>>
>>         *Joris Van Remoortere*
>>         Mesosphere
>>
>>         On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan...@intel.com
>>         <mailto:fan...@intel.com>
>>         <mailto:fan...@intel.com <mailto:fan...@intel.com>>> wrote:
>>
>>
>>
>>              On 2016/6/14 20:32, Joris Van Remoortere wrote:
>>
>>                       #1. Stick with attributes for rack awareness
>>
>>                  I don't think this is the right approach; however,
>>         there seem to
>>                  be 2
>>                  components to this discussion:
>>
>>                  1. How the values are presented (Attributes vs. a new
>>         type-aware
>>                  structure)
>>                  2. How the values are determined (scripts vs.
>>         automation vs.
>>                  modules)
>>
>>                  It seems you are more interested in working on #2. If
>>         that's the
>>                  case,
>>                  please make sure that you don't assume anything about
>>         #1, as we not
>>                  everyone agrees that we will use the existing
>>         attributes in the
>>                  future.
>>
>>
>>              On the condition of compatible with existing framework
>>         which already
>>              rely on parsing attributes for rack information.
>>
>>              Quotes from my original statements:
>>              > For compatibility with existing framework, I tend to be
>>         ok with using
>>              > attributes to convey the rack information
>>
>>              By all means, no matter what internal structures to use,
>>         current
>>              behavior should be honored. btw, I'm also thinking about
>>         #1, it's
>>              too earlier to bring up the details so far before the
>>         ticket got
>>              ACCEPTED.
>>
>>              Any way, I'm always open to all kind of discussion, thanks
>>         for your
>>              comments! Joris.
>>
>>                  For #2, you should focus on an API (module or script
>>         results)
>>                  that will
>>                  support all the different methods the community wants
>>         to use to
>>                  generate
>>                  this data.
>>
>>                  As you mentioned, updating the values for a running
>>         agent is not
>>                  straightforward. A lot of design work will need to go
>>         into how these
>>                  values are propagated to frameworks that have made
>>         assumptions about
>>                  them, and which values are allowed to change vs. not.
>>
>>                  —
>>                  *Joris Van Remoortere*
>>                  Mesosphere
>>
>>                  On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
>>         <aca...@ilm.com <mailto:aca...@ilm.com>
>>                  <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>
>>                  <mailto:aca...@ilm.com <mailto:aca...@ilm.com>
>>         <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>>> wrote:
>>
>>                       #3 would be very helpful for us. Also related:
>>
>>         https://issues.apache.org/jira/browse/MESOS-3059
>>
>>                       --
>>
>>                       Aaron Carey
>>                       Production Engineer - Cloud Pipeline
>>                       Industrial Light & Magic
>>                       London
>>                       020 3751 9150
>>
>>                       ________________________________________
>>                       From: Du, Fan [fan...@intel.com
>>         <mailto:fan...@intel.com> <mailto:fan...@intel.com
>>         <mailto:fan...@intel.com>>
>>                  <mailto:fan...@intel.com <mailto:fan...@intel.com>
>>         <mailto:fan...@intel.com <mailto:fan...@intel.com>>>]
>>                       Sent: 14 June 2016 07:24
>>                       To: user@mesos.apache.org
>>         <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
>>         <mailto:user@mesos.apache.org>>
>>                  <mailto:user@mesos.apache.org
>>         <mailto:user@mesos.apache.org> <mailto:user@mesos.apache.org
>>         <mailto:user@mesos.apache.org>>>;
>>         d...@mesos.apache.org <mailto:d...@mesos.apache.org>
>>         <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
>>                  <mailto:d...@mesos.apache.org
>>         <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org
>>         <mailto:d...@mesos.apache.org>>>
>>                       Cc: Joris Van Remoortere; vinodk...@apache.org
>>         <mailto:vinodk...@apache.org>
>>                  <mailto:vinodk...@apache.org <mailto:
>> vinodk...@apache.org>>
>>                       <mailto:vinodk...@apache.org
>>         <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org
>>         <mailto:vinodk...@apache.org>>>
>>
>>
>>                       Subject: Re: Rack awareness support for Mesos
>>
>>                       Hi everyone
>>
>>                       Let me summarize the discussion about Rack
>>         awareness in the
>>                  community so
>>                       far. First thanks for all the comments, advices or
>>                  challenges! :)
>>
>>                       #1. Stick with attributes for rack awareness
>>
>>                       For compatibility with existing framework, I tend
>>         to be ok
>>                  with using
>>                       attributes to convey the rack information, but
>>         with the
>>                  goal to do it
>>                       automatically, easy to maintain and with good
>>         attributes
>>                  schema. This
>>                       will bring up below question where the controversy
>>         starts.
>>
>>                       #2. Scripts vs programmatic way
>>
>>                       Both can be used to set attributes, I've made my
>>         arguments
>>                  in the Jira
>>                       and the Design doc, I'm not gonna to argue more
>>         here. But
>>                  please take a
>>                       look discussion at MESOS-3366 before, which allow
>>                  resources/attributes
>>                       discovery.
>>
>>                       A module to implement *slaveAttributesDecorator*
>>         hook will
>>                  works like
>>                       a charm here in a static way. And need to justify
>>                  attributes updating.
>>
>>                       #3. Allow updating attributes
>>                       Several cases need to be covered here:
>>
>>                       a). Mesos runs inside VMs or container, where live
>>                  migration happens, so
>>                       rack information need to be updated.
>>
>>                       b). LLDP packets are broadcasted by the interval
>>         10s~30s, a
>>                  vendor
>>                       specific implementation, and rack information are
>>         usually
>>                  stored in LLDP
>>                       daemon to be queried. Worst cases(nodes fresh
>>         reboot, or
>>                  daemon restart)
>>                       would be: Mesos slave have to wait 10s~30s for a
>>         valid rack
>>                  information
>>                       before register to master. Allow updating
>>         attributes will
>>                  mitigate this
>>                       problem.
>>
>>                       c). Framework affinity
>>
>>                       Framework X prefers to run on the same nodes with
>>         another
>>                  framwork Y.
>>                       For example, it's desirable for Shark or Spark-SQL
>> to
>>                  reside on the
>>                       *worker* node where Alluxio(former Tachyon) to
>>         gain more
>>                  performance
>>                       boosting as SPARK-6707 ticket message
>>                  {tachyon=true;us-east-1=false}
>>
>>                       If framework could advertise agent attributes in the
>>                  ResourcesOffer
>>                       process, awesome!
>>
>>
>>                       #4. Rearrange agents in a more scalable manner,
>>         like per
>>                  rack basis
>>
>>                       Randomly offering agents resource to framework
>>         does not
>>                  improve data
>>                       locality, imagine the likelihood of a framework
>>         getting
>>                  resources
>>                       underneath the same rack, at the scale of +30000
>>         nodes.
>>                  Moreover time to
>>                       randomly shuffle the agents also grows.
>>
>>                       How about rearranging the agent in a per rack
>>         basis, and a
>>                  minor change
>>                       to the way how resources are allocated will fix
>> this.
>>
>>
>>                       I might not see the whole picture here, so
>>         comments are
>>                  welcomed!
>>
>>
>>                       On 2016/6/6 17:17, Du, Fan wrote:
>>                        > Hi, Mesos folks
>>                        >
>>                        > I’ve been thinking about Mesos rack awareness
>>         support
>>                  for a while,
>>                        >
>>                        > it’s a common interest for lots of data center
>>                  applications to
>>                       provide
>>                        > data locality,
>>                        >
>>                        > fault tolerance and better task placement. Create
>>                  MESOS-5545 to track
>>                        > the story,
>>                        >
>>                        > and here is the initial design doc [1] to
>>         support rack
>>                  awareness
>>                       in Mesos.
>>                        >
>>                        > Looking forward to hear any comments from end
>>         user and other
>>                       developers,
>>                        >
>>                        > Thanks!
>>                        >
>>                        > [1]:
>>                        >
>>
>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>                        >
>>
>>
>>
>>
>>
>>
>

Re: Rack awareness support for Mesos

Reply via email to