Re: Current State of Service Discovery

craig w Wed, 01 Apr 2015 10:23:18 -0700

We're going with HAProxy on every node + haproxy-marathon-bridge (since
we're leveraging Marathon). We deployed mesos-dns but it didn't make seem
to make sense to have both solutions.


-craig

On Wed, Apr 1, 2015 at 12:33 PM, Adam Shannon <adam.shan...@banno.com>
wrote:

> David,
>
> Smartstack was one of the inspirations we used to decide how we wanted to
> build out service discovery. The one thing we decided on was that we wanted
> the haproxy instances to be the front line load balancers. (The ones
> directly open to the internet.)
>
> The one thing from the Smartstack post is that I don't see any mention of
> how their internal dashboards are routed and accessed. We are able to
> create backends which are propagated out to all instances of haproxy that
> proxy for dashboards. This allows us to scale those just as easily as any
> other service we run.
>
> On Wed, Apr 1, 2015 at 10:51 AM, David Kesler <dkes...@yodle.com> wrote:
>
>>  That approach sounds similar to Smartstack (
>> http://nerds.airbnb.com/smartstack-service-discovery-cloud/).
>>
>>
>>
>> *From:* Adam Shannon [mailto:adam.shan...@banno.com]
>> *Sent:* Wednesday, April 01, 2015 10:58 AM
>> *To:* mesos-users
>> *Subject:* Re: Current State of Service Discovery
>>
>>
>>
>> I figured I would comment on how Banno is setting up service discovery
>> with mesos. We've built everything around docker containers and then a
>> wrapper around that which we call "sidecar" that handles service discovery,
>> basic process supervision, and hot reloads of the underlying app config.
>>
>>
>>
>> Basically sidecar wraps an existing docker image (with FROM) and runs the
>> underlying command but monitors it. From there sidecar also has a concept
>> of being able to format templates which are written to the filesystem (in
>> the container). When writing the template sidecar tracks the ports and
>> configs allocated and used. It uses that information to add watches into
>> zookeeper (where we store overrides from the default config options per
>> app).
>>
>>
>>
>> The nicest thing we've found is that we're able to use a large range of
>> ports per mesos slave. Further, because our port information is stored in
>> zookeeper any other app (and therefore sidecar) we write can lookup
>> host/port info for a service they need.
>>
>>
>>
>> From the host/port information in zookeeper we've created a sidecar for
>> haproxy which can have backends created on it which are tied to app names
>> from services registered in zookeeper. This allows haproxy to query (and
>> watch) for all instances of apps and proxy them from the known host/port of
>> haproxy. When changes occur (and thus watches fired from zookeeper) the
>> haproxy-sidecar instances are able to reload with the updates.
>>
>>
>>
>> We're still working to get this all fully deployed to production (and
>> then open sourced), but it seems to combine some of the best features of
>> other public options.
>>
>>
>>
>> On Wed, Apr 1, 2015 at 9:05 AM, John Omernik <j...@omernik.com> wrote:
>>
>> I have been researching service discovery on Mesos quite a bit lately,
>> and due to my background, may be making assumptions that don't apply to a
>> Mesos Datacenter. I've read through docs, and I have come up with two main
>> approaches to service discovery, and both appear to have strengths and
>> weaknesses, and I wanted to describe what I've seen here, as well as the
>> challenges as I understand them to perhaps have any misconceptions I may
>> have corrected.
>>
>> Basically, I see two main approaches to the service discovery on Mesos.
>> You have the mesos-dns (https://github.com/mesosphere/mesos-dns) package
>> with is a DNS based service discovery, and then you have HAProxy based
>> discovery (which can be represented by both the haproxy-marathon-bridge (
>> https://github.com/mesosphere/marathon/blob/master/bin/haproxy-marathon-bridge)
>> script and the Bamboo project (https://github.com/QubitProducts/bamboo)).
>>
>> HAProxy
>>
>> With the HAProxy method, as I see it, you basically install HAProxy on
>> every node. The two above mentioned projects query marathon to determine
>> where the services are running, and then rewrite the haproxy config on
>> every node to allow basically every node to listen on a specific port, and
>> from there, that port will be forwarded, via round robin to the actual
>> node/port combinations where the services running.
>>
>> So, let's use the example of a Hive Thrift server running in a Docker
>> container on port 10000.  Lets say you have a 5 node cluster, node1, node2,
>> etc. You spin that container up with instances = 3 in marathon, and
>> Marathon/docker run the container on node2, node3 and another on node2
>>  There is a bridged port to 10000 inside the container, that is tied to an
>> available port on the physical node. Perhaps one instance on node2 gets
>> 30000 and the other instance gets 30001.  node3's instance is tied to port
>> 30001.  So now you have 3 instances that exposed at
>>
>> node2:30000  -> dockercontainer:10000
>> node2:30001 -> dockercontianer:10000
>> node3:30000 -> dockercontainer:10000
>>
>> With the Haproxy setup, each node would get this in its local haproxy
>> config:
>>
>> listen hivethrift-10000
>>   bind 0.0.0.0:10000
>>   mode tcp
>>   option tcplog
>>   balance leastconn
>>   server hivethrift-3 node2:30000 check
>>   server hivethrift-2 node2:30001 check
>>   server hivethrift-1 node3:30000 check
>>
>>
>>
>> This would allow you to connect to any node in your cluster, on port
>> 10000 and be served one of the three containers running your hive thrift
>> server.
>>
>>
>>
>> Pretty neat? However, there are some challenges here:
>>
>>
>>
>> 1. You now have a total of 65536 ports for your data center. This method
>> is port only, basically your whole cluster listens on a port and it's
>> dedicated to one service.  This actually makes sense in some ways because
>> if you think of Mesos as a cluster operating system, the limitations of
>> TCP/UDP are such that each kernel has that many ports.  There isn't a
>> cluster TCP or UDP, just TCP and UDP.  That still is a lot of ports,
>> however, you do need to be aware of the limitation and manage your ports.
>> Especially since that number isn't really the total number of available
>> ports. There are ports in that 65536 that are reserved for cluster
>> operations, and/or stuff like hdfs.
>>
>>
>>
>> 2. You are now essentially adding a hop to your traffic that could affect
>> sensitive applications.  At least with the haproxy-marathon-bridge script,
>> the settings for each application is static from the script (an update here
>> would be to allow timeout settings, and other haproxy options to be set per
>> application and managed somewhere, and I think that maybe what bamboo may
>> offer, just haven't dug in yet).  So the glaring issue I found was
>> specifically with the hive thrift service.  You connect, you run some
>> queries, all is well. However, if you submit a query, and it's a long query
>> (longer then the default 50000 ms timeout).  There may not be any packets
>> actually transferred in that time.  The client is ok with this, the server
>> is ok with this, however, haproxy sees no packets in it's timeout period,
>> and decides the connection is dead, closes it, and then you get problems.
>> I would imagine Thrift isn't the only service that may have situations like
>> this occur.  I need to do more research on how to get around this, there
>> may be some hope in hive 1.1.0 with thrift keep alives, however, not every
>> application service will have the option in the pipeline.
>>
>>
>>
>> Mesos-DNS
>>
>>
>>
>> This project came to my attention this week, and I am looking to get it
>> installed today to have hands on time with it.  Basically, it's a binary
>> that queries the mesos-master and develops A records that are hostnames,
>> based on the framework names, and SRV records based on the assigned ports.
>>
>>
>>
>> This is where I get confused. I can see the A records being useful,
>> however, you would have to have your entire network be able to be use the
>> mesos-dns (including non-mesos systems).  Otherwise how would a client know
>> to connect to a .mesos domain name? Perhaps there should be a way to
>> integrate mesos-dns as the authoritative zone for .mesos in your standard
>> enterprise DNS servers. This also saves the configuration issues of having
>> to add DNS services to all the nodes.  I need to research DNS a bit more,
>> but couldn't you setup, say in bind, that any requests in .mesos are
>> forwarded to the mesos-dns service, and then sent through your standard dns
>> back to the client?  Wouldn't this be preferable to setting the .mesos name
>> services as the first DNS server and then THAT forwards off to your
>> standard enterprise DNS servers?
>>
>>
>>
>> Another issue I see with DNS is it works well for hostnames, but what
>> about ports. Yes I see there there SRV records that will return the ports,
>> but how would that even be used?  Consider the hive thrift service example
>> above.  We could assume hive thrift would run on port 10000 on all nodes in
>> the cluster, and use the port, but then you run into the same issues as ha
>> proxy. You can't really specify a port via DNS in a jdbc connection URL can
>> you?  How do you get applications that want to connect to a integer port do
>> a DNS lookup to resolve a port? Or are we back to you have one cluster, and
>> you get 65536 ports for all the services you could want on that cluster?
>> Basically hard coding the ports? This then loses flexibility from a docker
>> port bridging perspective too, in that in my above haproxy example, all the
>> docker containers would have to expose port 10000 which would have caused a
>> conflict on node2.
>>
>>
>>
>>
>>
>>
>>
>> Summary
>>
>>
>>
>> So while I have a nice long email here, it seems I am either missing
>> something critical in how service discovery could work with a mesos
>> cluster, or there are still some pretty big difficulties that we need to
>> over come for an enterprise. Haproxy seems cool, and to work well except
>> for those "long running TCP connections" like thrift. I am at a loss how to
>> handle that. Mesos DNS is neat too, except for the port conflicts etc that
>> would occur if you used native ports on nodes, and if you didn't use native
>> ports, (mesos random ports) how do your applications know which port to
>> connect to (yes it's in the SRV record, however, how do you make apps aware
>> to look up a DNS record for a port?)
>>
>>
>>
>> Am I missing something? How are others handling these issues?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Adam Shannon | Software Engineer | Banno | Jack Henry
>>
>> 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
>>
>
>
>
> --
> Adam Shannon | Software Engineer | Banno | Jack Henry
> 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337
>



-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links

Re: Current State of Service Discovery

Reply via email to