RE: Current State of Service Discovery

David Kesler Wed, 01 Apr 2015 08:51:55 -0700

That approach sounds similar to Smartstack 
(http://nerds.airbnb.com/smartstack-service-discovery-cloud/).

From: Adam Shannon [mailto:adam.shan...@banno.com]
Sent: Wednesday, April 01, 2015 10:58 AM
To: mesos-users
Subject: Re: Current State of Service Discovery

I figured I would comment on how Banno is setting up service discovery with 
mesos. We've built everything around docker containers and then a wrapper 
around that which we call "sidecar" that handles service discovery, basic 
process supervision, and hot reloads of the underlying app config.

Basically sidecar wraps an existing docker image (with FROM) and runs the 
underlying command but monitors it. From there sidecar also has a concept of 
being able to format templates which are written to the filesystem (in the 
container). When writing the template sidecar tracks the ports and configs 
allocated and used. It uses that information to add watches into zookeeper 
(where we store overrides from the default config options per app).

The nicest thing we've found is that we're able to use a large range of ports 
per mesos slave. Further, because our port information is stored in zookeeper 
any other app (and therefore sidecar) we write can lookup host/port info for a 
service they need.

From the host/port information in zookeeper we've created a sidecar for haproxy 
which can have backends created on it which are tied to app names from services 
registered in zookeeper. This allows haproxy to query (and watch) for all 
instances of apps and proxy them from the known host/port of haproxy. When 
changes occur (and thus watches fired from zookeeper) the haproxy-sidecar 
instances are able to reload with the updates.

We're still working to get this all fully deployed to production (and then open 
sourced), but it seems to combine some of the best features of other public 
options.

On Wed, Apr 1, 2015 at 9:05 AM, John Omernik 
<j...@omernik.com<mailto:j...@omernik.com>> wrote:
I have been researching service discovery on Mesos quite a bit lately, and due 
to my background, may be making assumptions that don't apply to a Mesos 
Datacenter. I've read through docs, and I have come up with two main approaches 
to service discovery, and both appear to have strengths and weaknesses, and I 
wanted to describe what I've seen here, as well as the challenges as I 
understand them to perhaps have any misconceptions I may have corrected.

Basically, I see two main approaches to the service discovery on Mesos. You 
have the mesos-dns (https://github.com/mesosphere/mesos-dns) package with is a 
DNS based service discovery, and then you have HAProxy based discovery (which 
can be represented by both the haproxy-marathon-bridge 
(https://github.com/mesosphere/marathon/blob/master/bin/haproxy-marathon-bridge)
 script and the Bamboo project (https://github.com/QubitProducts/bamboo)).

HAProxy

With the HAProxy method, as I see it, you basically install HAProxy on every 
node. The two above mentioned projects query marathon to determine where the 
services are running, and then rewrite the haproxy config on every node to 
allow basically every node to listen on a specific port, and from there, that 
port will be forwarded, via round robin to the actual node/port combinations 
where the services running.

So, let's use the example of a Hive Thrift server running in a Docker container 
on port 10000.  Lets say you have a 5 node cluster, node1, node2, etc. You spin 
that container up with instances = 3 in marathon, and Marathon/docker run the 
container on node2, node3 and another on node2  There is a bridged port to 
10000 inside the container, that is tied to an available port on the physical 
node. Perhaps one instance on node2 gets 30000 and the other instance gets 
30001.  node3's instance is tied to port 30001.  So now you have 3 instances 
that exposed at

node2:30000  -> dockercontainer:10000
node2:30001 -> dockercontianer:10000
node3:30000 -> dockercontainer:10000

With the Haproxy setup, each node would get this in its local haproxy config:

listen hivethrift-10000
  bind 0.0.0.0:10000<http://0.0.0.0:10000>
  mode tcp
  option tcplog
  balance leastconn
  server hivethrift-3 node2:30000 check
  server hivethrift-2 node2:30001 check
  server hivethrift-1 node3:30000 check

This would allow you to connect to any node in your cluster, on port 10000 and 
be served one of the three containers running your hive thrift server.

Pretty neat? However, there are some challenges here:

1. You now have a total of 65536 ports for your data center. This method is 
port only, basically your whole cluster listens on a port and it's dedicated to 
one service.  This actually makes sense in some ways because if you think of 
Mesos as a cluster operating system, the limitations of TCP/UDP are such that 
each kernel has that many ports.  There isn't a cluster TCP or UDP, just TCP 
and UDP.  That still is a lot of ports, however, you do need to be aware of the 
limitation and manage your ports. Especially since that number isn't really the 
total number of available ports. There are ports in that 65536 that are 
reserved for cluster operations, and/or stuff like hdfs.

2. You are now essentially adding a hop to your traffic that could affect 
sensitive applications.  At least with the haproxy-marathon-bridge script, the 
settings for each application is static from the script (an update here would 
be to allow timeout settings, and other haproxy options to be set per 
application and managed somewhere, and I think that maybe what bamboo may 
offer, just haven't dug in yet).  So the glaring issue I found was specifically 
with the hive thrift service.  You connect, you run some queries, all is well. 
However, if you submit a query, and it's a long query (longer then the default 
50000 ms timeout).  There may not be any packets actually transferred in that 
time.  The client is ok with this, the server is ok with this, however, haproxy 
sees no packets in it's timeout period, and decides the connection is dead, 
closes it, and then you get problems.  I would imagine Thrift isn't the only 
service that may have situations like this occur.  I need to do more research 
on how to get around this, there may be some hope in hive 1.1.0 with thrift 
keep alives, however, not every application service will have the option in the 
pipeline.

Mesos-DNS

This project came to my attention this week, and I am looking to get it 
installed today to have hands on time with it.  Basically, it's a binary that 
queries the mesos-master and develops A records that are hostnames, based on 
the framework names, and SRV records based on the assigned ports.

This is where I get confused. I can see the A records being useful, however, 
you would have to have your entire network be able to be use the mesos-dns 
(including non-mesos systems).  Otherwise how would a client know to connect to 
a .mesos domain name? Perhaps there should be a way to integrate mesos-dns as 
the authoritative zone for .mesos in your standard enterprise DNS servers. This 
also saves the configuration issues of having to add DNS services to all the 
nodes.  I need to research DNS a bit more, but couldn't you setup, say in bind, 
that any requests in .mesos are forwarded to the mesos-dns service, and then 
sent through your standard dns back to the client?  Wouldn't this be preferable 
to setting the .mesos name services as the first DNS server and then THAT 
forwards off to your standard enterprise DNS servers?

Another issue I see with DNS is it works well for hostnames, but what about 
ports. Yes I see there there SRV records that will return the ports, but how 
would that even be used?  Consider the hive thrift service example above.  We 
could assume hive thrift would run on port 10000 on all nodes in the cluster, 
and use the port, but then you run into the same issues as ha proxy. You can't 
really specify a port via DNS in a jdbc connection URL can you?  How do you get 
applications that want to connect to a integer port do a DNS lookup to resolve 
a port? Or are we back to you have one cluster, and you get 65536 ports for all 
the services you could want on that cluster? Basically hard coding the ports? 
This then loses flexibility from a docker port bridging perspective too, in 
that in my above haproxy example, all the docker containers would have to 
expose port 10000 which would have caused a conflict on node2.

Summary

So while I have a nice long email here, it seems I am either missing something 
critical in how service discovery could work with a mesos cluster, or there are 
still some pretty big difficulties that we need to over come for an enterprise. 
Haproxy seems cool, and to work well except for those "long running TCP 
connections" like thrift. I am at a loss how to handle that. Mesos DNS is neat 
too, except for the port conflicts etc that would occur if you used native 
ports on nodes, and if you didn't use native ports, (mesos random ports) how do 
your applications know which port to connect to (yes it's in the SRV record, 
however, how do you make apps aware to look up a DNS record for a port?)

Am I missing something? How are others handling these issues?

--
Adam Shannon | Software Engineer | Banno | Jack Henry
206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337

RE: Current State of Service Discovery

Reply via email to