On 2016/6/6 23:48, Jörg Schad wrote:
Hi,
thanks for your idea and design doc!
Just a few thoughts:
a) The scheduling part would be implemented in a framework scheduler and
not the Mesos Core, or?

I'm not sure which level of scheduling part do you indicate,
For the "Future" section of proposal?, It's Mesos allocation logic.
And how to use rack information to implement advanced features (fault tolerance,
data locality) is up to the framework scheduling part.

b) As mentioned by James, this needs to be very flexible (and not
necessarily based on network structure),

The proposed network topology detection is modular, to fit into Ethernet,
Infiniband, or other network implementation. And yes, user can statically
configure /etc/mesos/rack_id to manipulate the logical network topology
easily.


afaik people are using labels
on the agents to identify different fault domains which can then be
interpreted by framework scheduler. Maybe it would make sense (instead
of identifying the network structure) to come up with a common label
naming scheme which can be understood by all/different frameworks.

I'm not convinced here why still using labels,
Based on what information to label the agents? IMO, cluster operator
still needs something like lldp to find out the network topology,
every cluster operator will need to do it by his own, and it's better
to abstract the logical inside Mesos to provide common interface to
frameworks.

Honestly speaking, I don't follow the argument here for the labels.
The proposal is designed to do it *automatically* to reduce maintenance effort.

Looking forward to your thoughts on this!

On Mon, Jun 6, 2016 at 3:27 PM, james <gar...@verizon.net
<mailto:gar...@verizon.net>> wrote:

    Hello,


    @Stephen::I guess Stephen is bringing up the 'security' aspect of
    who get's access to the information, particularly cluster/cloud
    devops, customers or interlopers....?


    @Fan:: As a consultant, most of my customers either have  or are
    planning hybrid installations, where some codes run on a local
    cluster or using 'the cloud' for dynamic load requirements. I would
    think your proposed scheme needs to be very flexible, both in
    application to a campus or Metropolitan Area Network, if not
    massively distributed around the globe. What about different resouce
    types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc.
    Hardware diversity bring many
    benefits to the cluster/cloud capabilities.


    This also begs the quesion of hardware management (boot/config/online)
    of the various hardware, such as is built into coreOS. Are several
    applications going to be supported? Standards track? Just Mesos DC/OS
    centric?


    TIMING DATA:: This is the main issue I see. Once you start 'vectoring
    in resources' you need to add timing (latency) data to encourage robust
    and diversified use of of this data. For HPC, this could be very
    valuable for rDMA abusive algorithms where memory constrained
    workloads not only need the knowledge of additional nearby memory
    resources, but
    the approximated (based on previous data collected) latency and
    bandwidth constraints to use those additional resources.


    Great idea. I do like it very much.

    hth,
    James



    On 06/06/2016 05:06 AM, Stephen Gran wrote:

        Hi,

        This looks potentially interesting.  How does it work in a
        public cloud
        deployment scenario?  I assume you would just have to disable this
        feature, or not enable it?

        Cheers,

        On 06/06/16 10:17, Du, Fan wrote:

            Hi, Mesos folks

            I’ve been thinking about Mesos rack awareness support for a
            while,

            it’s a common interest for lots of data center applications
            to provide
            data locality,

            fault tolerance and better task placement. Create MESOS-5545
            to track
            the story,

            and here is the initial design doc [1] to support rack
            awareness in Mesos.

            Looking forward to hear any comments from end user and other
            developers,

            Thanks!

            [1]:
            
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing




Reply via email to