The wild card certificate isn't a problem on it's own, it's using it in a manner that allows me to maintain all of the various features I want. Let me lay this out,
In marathon I have a task, it runs a drill bit. Since that task is located at the node prod/drillprod (for my env it's role/instanceid) the domain name is setup to be drillprod-prod.marathon.mesos. I can run X number of instances of that task. I tell Marathon to make them "host unique" so no two drill bits end up on the same node. This gives me a few things 1. If choose there to be 3 drillbits running, they go and run, and I don't have to worry about them. If I have to reboot the node one of them is on, Marathon says "oh look I am only running 2, let's spin up another, and then I get my required 3 bits running automatically. 2. They use a common config directory located in MapR-FS this is really nice because I don't have to maintain separate configurations for each drill bit. 3. The name above, drillprod-prod.marathon.mesos, using nslookup returns Name: drillprod-prod.marathon.mesos Address: 192.168.0.105 Name: drillprod-prod.marathon.mesos Address: 192.168.0.103 Name: drillprod-prod.marathon.mesos Address: 192.168.0.104 Which is desired. When I have a client connect, I can program in a single name (drillprod-prod.marathon.mesos) into my script and never have to worry about where the bits run on the cluster. It looks it up and works great. This has been my standard MO for scripts that do short lived things... I haven't had an issue until this new use case came up. (long running sessions for use in analytic notebooks is the use case BTW, just not super relevant to go into details on that here) Because of the DNS naming, my scripts get tossed around to different bits depending on how the DNS round robin provides the IP which is desired for various scripts. The issue comes into play when I make a session connection, and for some reason, (maybe after a cache time out or something) python's requests object makes the next request, but does a DNS lookup first causing the IP to change, and the session to invalidate. Not awesome when working in a notebook. The wildcard DNS "could" work, but there are some gotchas... I could create an application folder in marathon with the same name, prod/drillprod, and then in there I could create a task with the hostname for each host. However, this would then make me loose on the HAness of my setup. If I am trying to run 3 instances of bits, on nodes node104, node103, and node105, and I need to reboot node105, in my setup, node102 could get the new bit the dns name auto updates and I maintain HA with simplicity, however, with a wildcard cert, I would still need to manually spin up a new instance to maintain three instances. In addition, I would have to get a list of the three nodes running to pick one to connect to. Lots of complex orchestration to use wild card certs to maintain HA. The reverse proxy will work for me, I can program nginx to pin connections. Thus, I can have it base which backend it goes to based on the JSESSIONID, that should work, but I don't like it because it requires another component running in my network, not bad for me, I can easily run that on Zeta that won't be an issue at all, but as a whole, it's not ideal for Drill users. Thus I am back to the idea of Drill somehow maintaining a global state. This is also important for Drill on Yarn setups (unless there is some sort of application container proxy back to the bits). If you want to have security (SSL) with hostnames, the session maintenance must be addressed. So that's why I toss it out here... this is a desirable feature I would imagine, even if people are not asking for it now, it may not because they don't need it, but in their testing of Drill, and how they using it now, it may not come up... when they have multiple people and services hitting drill end points pointing them individual nodes for SSL management etc, becomes a nightmare... thus, as a thought exercise, could be securely maintain valid session ideas in Zookeeper for nodes to check on? What would an ideal setup for something like that be? On Fri, Jun 23, 2017 at 7:07 AM, Keys Botzum <kbot...@mapr.com> wrote: > Why is a wildcard certificate a problem? They are quite common. One just > needs all of the Drillbits to share a common domain for the wildcard to be > easy and thus avoid having to list individual hosts. > > Are you saying that you can't use hostnames and must use IPs? > > In case I'm not clear, here's an example of what I'm saying. > > this is good with wildcards: drill1.mydrill.corp.com<http:/ > /drill1.mydrill.corp.com>, drill2.mydrill.corp.com<http:/ > /drill2.mydrill.corp.com>, drill3.mydrill.corp.com<http:/ > /drill3.mydrill.corp.com>, drill4.mydrill.corp.com<http:/ > /drill4.mydrill.corp.com>, > this is bad with wildcards: drill1, drill2, drill3, drill4 > > > Keys > _______________________________ > Keys Botzum > MapR Technologies > > > > On Jun 22, 2017, at 8:24 PM, John Omernik <j...@omernik.com<mailto:john@ > omernik.com>> wrote: > > Would there be interest in finding a way to globalize this? This is > challenging for me and others that may run drill with multi Tennant > orchestrators. In my particular setup, each node running drill gets added > to an a record automatically giving me HA and distribution of Rest API > queries. It also allows me to have a single certificate for my cluster > rather than managing certificates on a individual basis. I set things up > to connect via IP but then I had certificate mismatch warnings. My goal is > to find a way to connect to the rest API , while maintaining a session to > single node, with out sacrificing HA and balancing and with compromising > ssl security. I know it's a tall order, but if there I ideas outside of a > global state management I am all ears. > > Note some ideas I've also considered: > > 1. using a load balancer that would allow me to pin connections. Not > ideal because it's another service to manage but it would work. > > 2. There may be a way to hack things with a wild card cert but it's seems > complicated and fragile. > > On Jun 22, 2017 5:47 PM, "Sorabh Hamirwasia" <shamirwa...@mapr.com<mailto: > shamirwa...@mapr.com>> wrote: > > Hi John, > As Paul mentioned session ID's are not global. Each session is part of the > BitToUserConnection instance created for a connection between Drillbit and > client. Hence it's local to that Drillbit only and the lifetime of the > session is tied to lifetime of the connection. You can find the code here< > https://github.com/apache/drill/blob/master/exec/ > java-exec/src/main/java/org/apache/drill/exec/rpc/user/ > UserServer.java#L102>. > > Thanks, > Sorabh > > ________________________________ > From: Paul Rogers <prog...@mapr.com> > Sent: Thursday, June 22, 2017 2:19:50 PM > To: user@drill.apache.org > Subject: Re: Drill Session ID between Nodes > > Hi John, > > I do not believe that session IDs are global. Each Drillbit maintains its > own concept of sessions. A global session would require some centralized > registry of sessions, which Drill does not have. > > Would be great if someone can confirm… > > - Paul > > On Jun 22, 2017, at 12:14 PM, John Omernik <j...@omernik.com> wrote: > > When I log onto a drill node, and get Session Id, if I connect to another > drill node in the cluster will the session id be valid? > > I am guessing not, but want to validate. > > My conumdrum, I have my Drill cluster running in such a way that the > connections to the nodes are load balanced via DNS. However, if I get a > DNS > IP while in session it appears to invalidate, and thus forces me to log > on... > > > >