We're going with HAProxy on every node + haproxy-marathon-bridge (since we're leveraging Marathon). We deployed mesos-dns but it didn't make seem to make sense to have both solutions.
-craig On Wed, Apr 1, 2015 at 12:33 PM, Adam Shannon <adam.shan...@banno.com> wrote: > David, > > Smartstack was one of the inspirations we used to decide how we wanted to > build out service discovery. The one thing we decided on was that we wanted > the haproxy instances to be the front line load balancers. (The ones > directly open to the internet.) > > The one thing from the Smartstack post is that I don't see any mention of > how their internal dashboards are routed and accessed. We are able to > create backends which are propagated out to all instances of haproxy that > proxy for dashboards. This allows us to scale those just as easily as any > other service we run. > > On Wed, Apr 1, 2015 at 10:51 AM, David Kesler <dkes...@yodle.com> wrote: > >> That approach sounds similar to Smartstack ( >> http://nerds.airbnb.com/smartstack-service-discovery-cloud/). >> >> >> >> *From:* Adam Shannon [mailto:adam.shan...@banno.com] >> *Sent:* Wednesday, April 01, 2015 10:58 AM >> *To:* mesos-users >> *Subject:* Re: Current State of Service Discovery >> >> >> >> I figured I would comment on how Banno is setting up service discovery >> with mesos. We've built everything around docker containers and then a >> wrapper around that which we call "sidecar" that handles service discovery, >> basic process supervision, and hot reloads of the underlying app config. >> >> >> >> Basically sidecar wraps an existing docker image (with FROM) and runs the >> underlying command but monitors it. From there sidecar also has a concept >> of being able to format templates which are written to the filesystem (in >> the container). When writing the template sidecar tracks the ports and >> configs allocated and used. It uses that information to add watches into >> zookeeper (where we store overrides from the default config options per >> app). >> >> >> >> The nicest thing we've found is that we're able to use a large range of >> ports per mesos slave. Further, because our port information is stored in >> zookeeper any other app (and therefore sidecar) we write can lookup >> host/port info for a service they need. >> >> >> >> From the host/port information in zookeeper we've created a sidecar for >> haproxy which can have backends created on it which are tied to app names >> from services registered in zookeeper. This allows haproxy to query (and >> watch) for all instances of apps and proxy them from the known host/port of >> haproxy. When changes occur (and thus watches fired from zookeeper) the >> haproxy-sidecar instances are able to reload with the updates. >> >> >> >> We're still working to get this all fully deployed to production (and >> then open sourced), but it seems to combine some of the best features of >> other public options. >> >> >> >> On Wed, Apr 1, 2015 at 9:05 AM, John Omernik <j...@omernik.com> wrote: >> >> I have been researching service discovery on Mesos quite a bit lately, >> and due to my background, may be making assumptions that don't apply to a >> Mesos Datacenter. I've read through docs, and I have come up with two main >> approaches to service discovery, and both appear to have strengths and >> weaknesses, and I wanted to describe what I've seen here, as well as the >> challenges as I understand them to perhaps have any misconceptions I may >> have corrected. >> >> Basically, I see two main approaches to the service discovery on Mesos. >> You have the mesos-dns (https://github.com/mesosphere/mesos-dns) package >> with is a DNS based service discovery, and then you have HAProxy based >> discovery (which can be represented by both the haproxy-marathon-bridge ( >> https://github.com/mesosphere/marathon/blob/master/bin/haproxy-marathon-bridge) >> script and the Bamboo project (https://github.com/QubitProducts/bamboo)). >> >> HAProxy >> >> With the HAProxy method, as I see it, you basically install HAProxy on >> every node. The two above mentioned projects query marathon to determine >> where the services are running, and then rewrite the haproxy config on >> every node to allow basically every node to listen on a specific port, and >> from there, that port will be forwarded, via round robin to the actual >> node/port combinations where the services running. >> >> So, let's use the example of a Hive Thrift server running in a Docker >> container on port 10000. Lets say you have a 5 node cluster, node1, node2, >> etc. You spin that container up with instances = 3 in marathon, and >> Marathon/docker run the container on node2, node3 and another on node2 >> There is a bridged port to 10000 inside the container, that is tied to an >> available port on the physical node. Perhaps one instance on node2 gets >> 30000 and the other instance gets 30001. node3's instance is tied to port >> 30001. So now you have 3 instances that exposed at >> >> node2:30000 -> dockercontainer:10000 >> node2:30001 -> dockercontianer:10000 >> node3:30000 -> dockercontainer:10000 >> >> With the Haproxy setup, each node would get this in its local haproxy >> config: >> >> listen hivethrift-10000 >> bind 0.0.0.0:10000 >> mode tcp >> option tcplog >> balance leastconn >> server hivethrift-3 node2:30000 check >> server hivethrift-2 node2:30001 check >> server hivethrift-1 node3:30000 check >> >> >> >> This would allow you to connect to any node in your cluster, on port >> 10000 and be served one of the three containers running your hive thrift >> server. >> >> >> >> Pretty neat? However, there are some challenges here: >> >> >> >> 1. You now have a total of 65536 ports for your data center. This method >> is port only, basically your whole cluster listens on a port and it's >> dedicated to one service. This actually makes sense in some ways because >> if you think of Mesos as a cluster operating system, the limitations of >> TCP/UDP are such that each kernel has that many ports. There isn't a >> cluster TCP or UDP, just TCP and UDP. That still is a lot of ports, >> however, you do need to be aware of the limitation and manage your ports. >> Especially since that number isn't really the total number of available >> ports. There are ports in that 65536 that are reserved for cluster >> operations, and/or stuff like hdfs. >> >> >> >> 2. You are now essentially adding a hop to your traffic that could affect >> sensitive applications. At least with the haproxy-marathon-bridge script, >> the settings for each application is static from the script (an update here >> would be to allow timeout settings, and other haproxy options to be set per >> application and managed somewhere, and I think that maybe what bamboo may >> offer, just haven't dug in yet). So the glaring issue I found was >> specifically with the hive thrift service. You connect, you run some >> queries, all is well. However, if you submit a query, and it's a long query >> (longer then the default 50000 ms timeout). There may not be any packets >> actually transferred in that time. The client is ok with this, the server >> is ok with this, however, haproxy sees no packets in it's timeout period, >> and decides the connection is dead, closes it, and then you get problems. >> I would imagine Thrift isn't the only service that may have situations like >> this occur. I need to do more research on how to get around this, there >> may be some hope in hive 1.1.0 with thrift keep alives, however, not every >> application service will have the option in the pipeline. >> >> >> >> Mesos-DNS >> >> >> >> This project came to my attention this week, and I am looking to get it >> installed today to have hands on time with it. Basically, it's a binary >> that queries the mesos-master and develops A records that are hostnames, >> based on the framework names, and SRV records based on the assigned ports. >> >> >> >> This is where I get confused. I can see the A records being useful, >> however, you would have to have your entire network be able to be use the >> mesos-dns (including non-mesos systems). Otherwise how would a client know >> to connect to a .mesos domain name? Perhaps there should be a way to >> integrate mesos-dns as the authoritative zone for .mesos in your standard >> enterprise DNS servers. This also saves the configuration issues of having >> to add DNS services to all the nodes. I need to research DNS a bit more, >> but couldn't you setup, say in bind, that any requests in .mesos are >> forwarded to the mesos-dns service, and then sent through your standard dns >> back to the client? Wouldn't this be preferable to setting the .mesos name >> services as the first DNS server and then THAT forwards off to your >> standard enterprise DNS servers? >> >> >> >> Another issue I see with DNS is it works well for hostnames, but what >> about ports. Yes I see there there SRV records that will return the ports, >> but how would that even be used? Consider the hive thrift service example >> above. We could assume hive thrift would run on port 10000 on all nodes in >> the cluster, and use the port, but then you run into the same issues as ha >> proxy. You can't really specify a port via DNS in a jdbc connection URL can >> you? How do you get applications that want to connect to a integer port do >> a DNS lookup to resolve a port? Or are we back to you have one cluster, and >> you get 65536 ports for all the services you could want on that cluster? >> Basically hard coding the ports? This then loses flexibility from a docker >> port bridging perspective too, in that in my above haproxy example, all the >> docker containers would have to expose port 10000 which would have caused a >> conflict on node2. >> >> >> >> >> >> >> >> Summary >> >> >> >> So while I have a nice long email here, it seems I am either missing >> something critical in how service discovery could work with a mesos >> cluster, or there are still some pretty big difficulties that we need to >> over come for an enterprise. Haproxy seems cool, and to work well except >> for those "long running TCP connections" like thrift. I am at a loss how to >> handle that. Mesos DNS is neat too, except for the port conflicts etc that >> would occur if you used native ports on nodes, and if you didn't use native >> ports, (mesos random ports) how do your applications know which port to >> connect to (yes it's in the SRV record, however, how do you make apps aware >> to look up a DNS record for a port?) >> >> >> >> Am I missing something? How are others handling these issues? >> >> >> >> >> >> >> >> >> >> -- >> >> Adam Shannon | Software Engineer | Banno | Jack Henry >> >> 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337 >> > > > > -- > Adam Shannon | Software Engineer | Banno | Jack Henry > 206 6th Ave Suite 1020 | Des Moines, IA 50309 | Cell: 515.867.8337 > -- https://github.com/mindscratch https://www.google.com/+CraigWickesser https://twitter.com/mind_scratch https://twitter.com/craig_links