Re: Rack awareness support for Mesos

Du, Fan Tue, 14 Jun 2016 06:03:25 -0700


On 2016/6/14 20:32, Joris Van Remoortere wrote:

    #1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.

On the condition of compatible with existing framework which alreadyrely on parsing attributes for rack information.


Quotes from my original statements:
> For compatibility with existing framework, I tend to be ok with using
> attributes to convey the rack information

By all means, no matter what internal structures to use, currentbehavior should be honored. btw, I'm also thinking about #1, it's tooearlier to bring up the details so far before the ticket got ACCEPTED.

Any way, I'm always open to all kind of discussion, thanks for yourcomments! Joris.

For #2, you should focus on an API (module or script results) that will
support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these
values are propagated to frameworks that have made assumptions about
them, and which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <aca...@ilm.com
<mailto:aca...@ilm.com>> wrote:

    #3 would be very helpful for us. Also related:

    https://issues.apache.org/jira/browse/MESOS-3059

    --

    Aaron Carey
    Production Engineer - Cloud Pipeline
    Industrial Light & Magic
    London
    020 3751 9150

    ________________________________________
    From: Du, Fan [fan...@intel.com <mailto:fan...@intel.com>]
    Sent: 14 June 2016 07:24
    To: user@mesos.apache.org <mailto:user@mesos.apache.org>;
    d...@mesos.apache.org <mailto:d...@mesos.apache.org>
    Cc: Joris Van Remoortere; vinodk...@apache.org
    <mailto:vinodk...@apache.org>
    Subject: Re: Rack awareness support for Mesos

    Hi everyone

    Let me summarize the discussion about Rack awareness in the community so
    far. First thanks for all the comments, advices or challenges! :)

    #1. Stick with attributes for rack awareness

    For compatibility with existing framework, I tend to be ok with using
    attributes to convey the rack information, but with the goal to do it
    automatically, easy to maintain and with good attributes schema. This
    will bring up below question where the controversy starts.

    #2. Scripts vs programmatic way

    Both can be used to set attributes, I've made my arguments in the Jira
    and the Design doc, I'm not gonna to argue more here. But please take a
    look discussion at MESOS-3366 before, which allow resources/attributes
    discovery.

    A module to implement *slaveAttributesDecorator* hook will works like
    a charm here in a static way. And need to justify attributes updating.

    #3. Allow updating attributes
    Several cases need to be covered here:

    a). Mesos runs inside VMs or container, where live migration happens, so
    rack information need to be updated.

    b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
    specific implementation, and rack information are usually stored in LLDP
    daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
    would be: Mesos slave have to wait 10s~30s for a valid rack information
    before register to master. Allow updating attributes will mitigate this
    problem.

    c). Framework affinity

    Framework X prefers to run on the same nodes with another framwork Y.
    For example, it's desirable for Shark or Spark-SQL to reside on the
    *worker* node where Alluxio(former Tachyon) to gain more performance
    boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}

    If framework could advertise agent attributes in the ResourcesOffer
    process, awesome!


    #4. Rearrange agents in a more scalable manner, like per rack basis

    Randomly offering agents resource to framework does not improve data
    locality, imagine the likelihood of a framework getting resources
    underneath the same rack, at the scale of +30000 nodes. Moreover time to
    randomly shuffle the agents also grows.

    How about rearranging the agent in a per rack basis, and a minor change
    to the way how resources are allocated will fix this.


    I might not see the whole picture here, so comments are welcomed!


    On 2016/6/6 17:17, Du, Fan wrote:
     > Hi, Mesos folks
     >
     > I’ve been thinking about Mesos rack awareness support for a while,
     >
     > it’s a common interest for lots of data center applications to
    provide
     > data locality,
     >
     > fault tolerance and better task placement. Create MESOS-5545 to track
     > the story,
     >
     > and here is the initial design doc [1] to support rack awareness
    in Mesos.
     >
     > Looking forward to hear any comments from end user and other
    developers,
     >
     > Thanks!
     >
     > [1]:
     >
    
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
     >

Re: Rack awareness support for Mesos

Reply via email to