Re: Current State of Service Discovery

Adam Shannon Wed, 01 Apr 2015 08:00:41 -0700

I figured I would comment on how Banno is setting up service discovery with
mesos. We've built everything around docker containers and then a wrapper
around that which we call "sidecar" that handles service discovery, basic
process supervision, and hot reloads of the underlying app config.


Basically sidecar wraps an existing docker image (with FROM) and runs the
underlying command but monitors it. From there sidecar also has a concept
of being able to format templates which are written to the filesystem (in
the container). When writing the template sidecar tracks the ports and
configs allocated and used. It uses that information to add watches into
zookeeper (where we store overrides from the default config options per
app).

The nicest thing we've found is that we're able to use a large range of
ports per mesos slave. Further, because our port information is stored in
zookeeper any other app (and therefore sidecar) we write can lookup
host/port info for a service they need.

>From the host/port information in zookeeper we've created a sidecar for
haproxy which can have backends created on it which are tied to app names
from services registered in zookeeper. This allows haproxy to query (and
watch) for all instances of apps and proxy them from the known host/port of
haproxy. When changes occur (and thus watches fired from zookeeper) the
haproxy-sidecar instances are able to reload with the updates.

We're still working to get this all fully deployed to production (and then
open sourced), but it seems to combine some of the best features of other
public options.

On Wed, Apr 1, 2015 at 9:05 AM, John Omernik <j...@omernik.com> wrote:

> I have been researching service discovery on Mesos quite a bit lately, and
> due to my background, may be making assumptions that don't apply to a Mesos
> Datacenter. I've read through docs, and I have come up with two main
> approaches to service discovery, and both appear to have strengths and
> weaknesses, and I wanted to describe what I've seen here, as well as the
> challenges as I understand them to perhaps have any misconceptions I may
> have corrected.
>
> Basically, I see two main approaches to the service discovery on Mesos.
> You have the mesos-dns (https://github.com/mesosphere/mesos-dns) package
> with is a DNS based service discovery, and then you have HAProxy based
> discovery (which can be represented by both the haproxy-marathon-bridge (
> https://github.com/mesosphere/marathon/blob/master/bin/haproxy-marathon-bridge)
> script and the Bamboo project (https://github.com/QubitProducts/bamboo)).
>
> HAProxy
>
> With the HAProxy method, as I see it, you basically install HAProxy on
> every node. The two above mentioned projects query marathon to determine
> where the services are running, and then rewrite the haproxy config on
> every node to allow basically every node to listen on a specific port, and
> from there, that port will be forwarded, via round robin to the actual
> node/port combinations where the services running.
>
> So, let's use the example of a Hive Thrift server running in a Docker
> container on port 10000.  Lets say you have a 5 node cluster, node1, node2,
> etc. You spin that container up with instances = 3 in marathon, and
> Marathon/docker run the container on node2, node3 and another on node2
>  There is a bridged port to 10000 inside the container, that is tied to an
> available port on the physical node. Perhaps one instance on node2 gets
> 30000 and the other instance gets 30001.  node3's instance is tied to port
> 30001.  So now you have 3 instances that exposed at
>
> node2:30000  -> dockercontainer:10000
> node2:30001 -> dockercontianer:10000
> node3:30000 -> dockercontainer:10000
>
> With the Haproxy setup, each node would get this in its local haproxy
> config:
>
> listen hivethrift-10000
>   bind 0.0.0.0:10000
>   mode tcp
>   option tcplog
>   balance leastconn
>   server hivethrift-3 node2:30000 check
>   server hivethrift-2 node2:30001 check
>   server hivethrift-1 node3:30000 check
>
> This would allow you to connect to any node in your cluster, on port 10000
> and be served one of the three containers running your hive thrift server.
>
> Pretty neat? However, there are some challenges here:
>
> 1. You now have a total of 65536 ports for your data center. This method
> is port only, basically your whole cluster listens on a port and it's
> dedicated to one service.  This actually makes sense in some ways because
> if you think of Mesos as a cluster operating system, the limitations of
> TCP/UDP are such that each kernel has that many ports.  There isn't a
> cluster TCP or UDP, just TCP and UDP.  That still is a lot of ports,
> however, you do need to be aware of the limitation and manage your ports.
> Especially since that number isn't really the total number of available
> ports. There are ports in that 65536 that are reserved for cluster
> operations, and/or stuff like hdfs.
>
> 2. You are now essentially adding a hop to your traffic that could affect
> sensitive applications.  At least with the haproxy-marathon-bridge script,
> the settings for each application is static from the script (an update here
> would be to allow timeout settings, and other haproxy options to be set per
> application and managed somewhere, and I think that maybe what bamboo may
> offer, just haven't dug in yet).  So the glaring issue I found was
> specifically with the hive thrift service.  You connect, you run some
> queries, all is well. However, if you submit a query, and it's a long query
> (longer then the default 50000 ms timeout).  There may not be any packets
> actually transferred in that time.  The client is ok with this, the server
> is ok with this, however, haproxy sees no packets in it's timeout period,
> and decides the connection is dead, closes it, and then you get problems.
> I would imagine Thrift isn't the only service that may have situations like
> this occur.  I need to do more research on how to get around this, there
> may be some hope in hive 1.1.0 with thrift keep alives, however, not every
> application service will have the option in the pipeline.
>
> Mesos-DNS
>
> This project came to my attention this week, and I am looking to get it
> installed today to have hands on time with it.  Basically, it's a binary
> that queries the mesos-master and develops A records that are hostnames,
> based on the framework names, and SRV records based on the assigned ports.
>
> This is where I get confused. I can see the A records being useful,
> however, you would have to have your entire network be able to be use the
> mesos-dns (including non-mesos systems).  Otherwise how would a client know
> to connect to a .mesos domain name? Perhaps there should be a way to
> integrate mesos-dns as the authoritative zone for .mesos in your standard
> enterprise DNS servers. This also saves the configuration issues of having
> to add DNS services to all the nodes.  I need to research DNS a bit more,
> but couldn't you setup, say in bind, that any requests in .mesos are
> forwarded to the mesos-dns service, and then sent through your standard dns
> back to the client?  Wouldn't this be preferable to setting the .mesos name
> services as the first DNS server and then THAT forwards off to your
> standard enterprise DNS servers?
>
> Another issue I see with DNS is it works well for hostnames, but what
> about ports. Yes I see there there SRV records that will return the ports,
> but how would that even be used?  Consider the hive thrift service example
> above.  We could assume hive thrift would run on port 10000 on all nodes in
> the cluster, and use the port, but then you run into the same issues as ha
> proxy. You can't really specify a port via DNS in a jdbc connection URL can
> you?  How do you get applications that want to connect to a integer port do
> a DNS lookup to resolve a port? Or are we back to you have one cluster, and
> you get 65536 ports for all the services you could want on that cluster?
> Basically hard coding the ports? This then loses flexibility from a docker
> port bridging perspective too, in that in my above haproxy example, all the
> docker containers would have to expose port 10000 which would have caused a
> conflict on node2.
>
>
>
> Summary
>
> So while I have a nice long email here, it seems I am either missing
> something critical in how service discovery could work with a mesos
> cluster, or there are still some pretty big difficulties that we need to
> over come for an enterprise. Haproxy seems cool, and to work well except
> for those "long running TCP connections" like thrift. I am at a loss how to
> handle that. Mesos DNS is neat too, except for the port conflicts etc that
> would occur if you used native ports on nodes, and if you didn't use native
> ports, (mesos random ports) how do your applications know which port to
> connect to (yes it's in the SRV record, however, how do you make apps aware
> to look up a DNS record for a port?)
>
> Am I missing something? How are others handling these issues?
>
>
>


-- 
Adam Shannon | Software Engineer | Banno | Jack Henry
206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337

Re: Current State of Service Discovery

Reply via email to