Re: Rack awareness support for Mesos

Du, Fan Fri, 17 Jun 2016 00:57:00 -0700


On 2016/6/17 7:59, Joris Van Remoortere wrote:

@Fan,

In the community meeting a question was raised around which frameworks
might be ready to use this.
Can you provide some more context for immediate use cases on the
framework side?


Hi Joris

Thanks for the bridging!

Frameworks capable of topology-aware replication strategies will benefithere. For how topology-aware replication works, please refer to section"Hadoop Rack awareness - Why?" in [1], the methodology will apply toother frameworks too.

For a POC, we can start from SPARK-6707[2] with the new rack awarenessinterface, and for the topology-aware replication case, I thinkdcos-cassandra-service[3] is also a good start to implement rackawareness, because this repo is actively developed.

and I believe there are plenty of use cases here, so if anyone have moreuse case to prove this feature is useful, feel free to water fall.


Thanks!

[1]:http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/

[2]: https://issues.apache.org/jira/browse/SPARK-6707
[3]: https://github.com/mesosphere/dcos-cassandra-service

—
*Joris Van Remoortere*
Mesosphere

On Wed, Jun 15, 2016 at 5:04 PM, james <gar...@verizon.net
<mailto:gar...@verizon.net>> wrote:

    @Joris,


    OK. Now I understand where you are coming from. As soon as I get
    some time, I'll join that design discussion. Thanks for the
    clarifications.

    James





    On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:

                 Since your interest is in the determination of the
        values, as
                 opposed to

                 their propagation, I would just urge that you keep in
        mind that
                 we may

                 (as a project) not want to support this information as
        the current

                 string attributes.


             Huh? Why not? If the attributes change, why can't this
        sub-project
             just change with those changing string attributes? Maybe some
             elaboration how this might not naturally be able to evolve is a
             warranted detail of discussion?


        Sorry, I should clarify what I meant by support. By support I
        mean that
        we may not want to promise that those values will be there
        (support as a
        feature), and what schemas are mangled into the random strings
        that we
        currently call attributes. I did not mean that we wouldn't allow
        users
        to inject their own values if they wanted to. We just wouldn't
        control
        the standard or schema as a project and therefore couldn't
        support it.

        Any random collection of strings that has previously had no reserved
        keywords is notoriously difficult to build new schemas in.
        This is why we may want to instead introduce a typed structure
        that is
        dedicated to fault domain information. This:

           * Prevents us from colliding with current users' attributes.
           * Allows us to have more control over the types (YAY) and
        ranges of
             values.
           * Allows us to introduce explicit structure such as dependency or
             hierarchy.

        The fact that users have already encoded information in
        attributes is
        not a reason for us to limit ourselves to that scope when better
        structures may be available. This is why we shouldn't assume
        that the
        project will *provide support for* (as opposed to allow users
        to) using
        attributes.

        As your said, it is their prerogative to join the design
        discussion to
        ensure that any formalized structure or schema we introduce is
        one that
        they are agreeable with.



        —
        *Joris Van Remoortere*
        Mesosphere

        On Tue, Jun 14, 2016 at 6:31 PM, james <gar...@verizon.net
        <mailto:gar...@verizon.net>
        <mailto:gar...@verizon.net <mailto:gar...@verizon.net>>> wrote:

             On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:

                     On the condition of compatible with existing
        framework which
                     already rely on parsing attributes for rack
        information.

                 There is currently nothing in Mesos that specifies the
        format or
                 structure for rack information in attributes.
                 The fact that operators / frameworks have decided to
        add this
                 information out of band is their problem to solve.
                 We don't need to be backwards compatible with something
        we never
                 published to begin with. This is why it's ok for us to
        consider
                 adding a
                 typed form of failure domain information that is
        separate from the
                 typeless string attributes.


             True. But you have to start somewhere, know that the schema and
             codes will morph over time to maintain relevance  and
        usefulness. In
             that vein, if folks have established interesting and useful
             parameters for this work, then it is most beneficial that those
             methods and codes are considered carefully.  AKA:: speak up
        now.
             Diversity and inclusion are keenly beneficial, where practical.


                 Since your interest is in the determination of the
        values, as
                 opposed to
                 their propagation, I would just urge that you keep in
        mind that
                 we may
                 (as a project) not want to support this information as
        the current
                 string attributes.


             Huh? Why not? If the attributes change, why can't this
        sub-project
             just change with those changing string attributes? Maybe some
             elaboration how this might not naturally be able to evolve is a
             warranted detail of discussion?


             I would venture that both 'determination of the values and
             propagation (delays)' are inherently important in a cluster
        of many
             things:: hardware, resources, frameworks, security codes,
        etc etc.
             The author
             and others seem to be keenly aware that a tight focus is
        not going
             to work, at this stage, so a broad appeal to a multitude of
        needs is
             best.
             And in fact, until some idea is proven to be useless or too
        difficult to
             implement, the bigger the tent, the more useful the codes that
             define this project/idea become.  Personally, I'm very
        excited that
             someone has stepped up in this area; hoping they keep an
        open mind
             and flexibility geared toward multiplicative usage, in the
        future.
             Most mature hardware folks who build ideas into robust
        systems do
             exactly that, to motivate a multiplicative usage for organizing
             hardware, performance and state metrics, and timing signals,
             gregariously. All of this is routine semantics from a hardware
             perspective.

             At some point, folks will realize that kernel
        configuration, testing
             and tweaks are critical to cluster performance, regardless
        of the codes
             running on top of the cluster. So this project could easily
        use cgroups
             and such for achieve robustness in many areas of need.


             Like it or not large amounts of hardware, need to have schema,
             planning and architectural robustness to keep large amounts of
             hardware, pristinely  available for software efficiency to
        be any
             where near optimal deployment. This really becomes critical
        when the
             mix of different CPU types, GPUs and ram are to be
        considered in
             future deployments, regardless if you outsource or run your own
             cluster. Hardware vendors are going to want to sell their
        products
             to as wide of a customer base a possible and customers are
        going to
             demand seamless management for expansion of resources.
        Furthermore,
             as a consultant my experiences are that much of the future
        market is
             going to demand outsourced, hybrid and in-house options as a
             fundamental tenant of cluster resource adoption.

             hth,
             James


                 *Joris Van Remoortere*
                 Mesosphere

                 On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan
        <fan...@intel.com <mailto:fan...@intel.com>
                 <mailto:fan...@intel.com <mailto:fan...@intel.com>>
                 <mailto:fan...@intel.com <mailto:fan...@intel.com>
        <mailto:fan...@intel.com <mailto:fan...@intel.com>>>> wrote:



                      On 2016/6/14 20:32, Joris Van Remoortere wrote:

                               #1. Stick with attributes for rack awareness

                          I don't think this is the right approach; however,
                 there seem to
                          be 2
                          components to this discussion:

                          1. How the values are presented (Attributes
        vs. a new
                 type-aware
                          structure)
                          2. How the values are determined (scripts vs.
                 automation vs.
                          modules)

                          It seems you are more interested in working on
        #2. If
                 that's the
                          case,
                          please make sure that you don't assume
        anything about
                 #1, as we not
                          everyone agrees that we will use the existing
                 attributes in the
                          future.


                      On the condition of compatible with existing framework
                 which already
                      rely on parsing attributes for rack information.

                      Quotes from my original statements:
                      > For compatibility with existing framework, I
        tend to be
                 ok with using
                      > attributes to convey the rack information

                      By all means, no matter what internal structures
        to use,
                 current
                      behavior should be honored. btw, I'm also thinking
        about
                 #1, it's
                      too earlier to bring up the details so far before the
                 ticket got
                      ACCEPTED.

                      Any way, I'm always open to all kind of
        discussion, thanks
                 for your
                      comments! Joris.

                          For #2, you should focus on an API (module or
        script
                 results)
                          that will
                          support all the different methods the
        community wants
                 to use to
                          generate
                          this data.

                          As you mentioned, updating the values for a
        running
                 agent is not
                          straightforward. A lot of design work will
        need to go
                 into how these
                          values are propagated to frameworks that have made
                 assumptions about
                          them, and which values are allowed to change
        vs. not.

                          —
                          *Joris Van Remoortere*
                          Mesosphere

                          On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
                 <aca...@ilm.com <mailto:aca...@ilm.com>
        <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>
                          <mailto:aca...@ilm.com <mailto:aca...@ilm.com>
        <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>>
                          <mailto:aca...@ilm.com <mailto:aca...@ilm.com>
        <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>
                 <mailto:aca...@ilm.com <mailto:aca...@ilm.com>
        <mailto:aca...@ilm.com <mailto:aca...@ilm.com>>>>> wrote:

                               #3 would be very helpful for us. Also
        related:

        https://issues.apache.org/jira/browse/MESOS-3059

                               --

                               Aaron Carey
                               Production Engineer - Cloud Pipeline
                               Industrial Light & Magic
                               London
                               020 3751 9150

                               ________________________________________
                               From: Du, Fan [fan...@intel.com
        <mailto:fan...@intel.com>
                 <mailto:fan...@intel.com <mailto:fan...@intel.com>>
        <mailto:fan...@intel.com <mailto:fan...@intel.com>
                 <mailto:fan...@intel.com <mailto:fan...@intel.com>>>
                          <mailto:fan...@intel.com
        <mailto:fan...@intel.com> <mailto:fan...@intel.com
        <mailto:fan...@intel.com>>
                 <mailto:fan...@intel.com <mailto:fan...@intel.com>
        <mailto:fan...@intel.com <mailto:fan...@intel.com>>>>]
                               Sent: 14 June 2016 07:24
                               To: user@mesos.apache.org
        <mailto:user@mesos.apache.org>
                 <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>> <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>
                 <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>>>
                          <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>
                 <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>> <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>
                 <mailto:user@mesos.apache.org
        <mailto:user@mesos.apache.org>>>>;
        d...@mesos.apache.org <mailto:d...@mesos.apache.org>
        <mailto:d...@mesos.apache.org <mailto:d...@mesos.apache.org>>
                 <mailto:d...@mesos.apache.org
        <mailto:d...@mesos.apache.org> <mailto:d...@mesos.apache.org
        <mailto:d...@mesos.apache.org>>>
                          <mailto:d...@mesos.apache.org
        <mailto:d...@mesos.apache.org>
                 <mailto:d...@mesos.apache.org
        <mailto:d...@mesos.apache.org>> <mailto:d...@mesos.apache.org
        <mailto:d...@mesos.apache.org>
                 <mailto:d...@mesos.apache.org
        <mailto:d...@mesos.apache.org>>>>
                               Cc: Joris Van Remoortere;
        vinodk...@apache.org <mailto:vinodk...@apache.org>
                 <mailto:vinodk...@apache.org <mailto:vinodk...@apache.org>>
                          <mailto:vinodk...@apache.org
        <mailto:vinodk...@apache.org> <mailto:vinodk...@apache.org
        <mailto:vinodk...@apache.org>>>
                               <mailto:vinodk...@apache.org
        <mailto:vinodk...@apache.org>
                 <mailto:vinodk...@apache.org
        <mailto:vinodk...@apache.org>> <mailto:vinodk...@apache.org
        <mailto:vinodk...@apache.org>
                 <mailto:vinodk...@apache.org
        <mailto:vinodk...@apache.org>>>>


                               Subject: Re: Rack awareness support for Mesos

                               Hi everyone

                               Let me summarize the discussion about Rack
                 awareness in the
                          community so
                               far. First thanks for all the comments,
        advices or
                          challenges! :)

                               #1. Stick with attributes for rack awareness

                               For compatibility with existing
        framework, I tend
                 to be ok
                          with using
                               attributes to convey the rack
        information, but
                 with the
                          goal to do it
                               automatically, easy to maintain and with good
                 attributes
                          schema. This
                               will bring up below question where the
        controversy
                 starts.

                               #2. Scripts vs programmatic way

                               Both can be used to set attributes, I've
        made my
                 arguments
                          in the Jira
                               and the Design doc, I'm not gonna to
        argue more
                 here. But
                          please take a
                               look discussion at MESOS-3366 before,
        which allow
                          resources/attributes
                               discovery.

                               A module to implement
        *slaveAttributesDecorator*
                 hook will
                          works like
                               a charm here in a static way. And need to
        justify
                          attributes updating.

                               #3. Allow updating attributes
                               Several cases need to be covered here:

                               a). Mesos runs inside VMs or container,
        where live
                          migration happens, so
                               rack information need to be updated.

                               b). LLDP packets are broadcasted by the
        interval
                 10s~30s, a
                          vendor
                               specific implementation, and rack
        information are
                 usually
                          stored in LLDP
                               daemon to be queried. Worst cases(nodes fresh
                 reboot, or
                          daemon restart)
                               would be: Mesos slave have to wait
        10s~30s for a
                 valid rack
                          information
                               before register to master. Allow updating
                 attributes will
                          mitigate this
                               problem.

                               c). Framework affinity

                               Framework X prefers to run on the same
        nodes with
                 another
                          framwork Y.
                               For example, it's desirable for Shark or
        Spark-SQL to
                          reside on the
                               *worker* node where Alluxio(former
        Tachyon) to
                 gain more
                          performance
                               boosting as SPARK-6707 ticket message
                          {tachyon=true;us-east-1=false}

                               If framework could advertise agent
        attributes in the
                          ResourcesOffer
                               process, awesome!


                               #4. Rearrange agents in a more scalable
        manner,
                 like per
                          rack basis

                               Randomly offering agents resource to
        framework
                 does not
                          improve data
                               locality, imagine the likelihood of a
        framework
                 getting
                          resources
                               underneath the same rack, at the scale of
        +30000
                 nodes.
                          Moreover time to
                               randomly shuffle the agents also grows.

                               How about rearranging the agent in a per rack
                 basis, and a
                          minor change
                               to the way how resources are allocated
        will fix this.


                               I might not see the whole picture here, so
                 comments are
                          welcomed!


                               On 2016/6/6 17:17, Du, Fan wrote:
                                > Hi, Mesos folks
                                >
                                > I’ve been thinking about Mesos rack
        awareness
                 support
                          for a while,
                                >
                                > it’s a common interest for lots of
        data center
                          applications to
                               provide
                                > data locality,
                                >
                                > fault tolerance and better task
        placement. Create
                          MESOS-5545 to track
                                > the story,
                                >
                                > and here is the initial design doc [1] to
                 support rack
                          awareness
                               in Mesos.
                                >
                                > Looking forward to hear any comments
        from end
                 user and other
                               developers,
                                >
                                > Thanks!
                                >
                                > [1]:
                                >
        
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
                                >

Re: Rack awareness support for Mesos

Reply via email to