That makes sense, ya, I would love to hear about the challenges of this in
general from the Drill folks.

Also, I wonder if Paul R at MapR has any thoughts in how something like
this would be handled in the Drill on Yarn Setup.


John

On Fri, Jun 23, 2017 at 10:58 AM, Keys Botzum <kbot...@mapr.com> wrote:

> I think we are on the same page regarding SSL.
>
> Regarding (1) it's best that I defer to the drill experts but I will
> mention that sharing session state can greatly complicate scalability.
> Since switching drillbits should be a rare event, it is probably more
> scalable to send back to the client a token which represents the
> authenticated identity (encrypted and signed of course). Then should that
> show up at another drillbit, the user authentication state can be
> reestablished. Other state such as caches would be lost of course. I don't
> know enough about Drill internals of course - there may be other state
> issues beyond just authentication.
>
> Keys
> _______________________________
> Keys Botzum
> Distinguished Engineer, Field Engineering
> kbot...@maprtech.com<mailto:kbot...@maprtech.com>
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
>
>
>
> On Jun 23, 2017, at 11:33 AM, John Omernik <j...@omernik.com<mailto:john@
> omernik.com>> wrote:
>
> So a few things
>
> 1. The issue is that as is,  SSL stuff works fine, but when the IP address
> that DNS returns for the hostname changes, the session is invalidated and I
> am forced to logon again... this is annoying and loses session context
> information.  If I try to lay out my cluster differently, i.e. using the
> wildcard certs and the different marathon layout, I then have different
> issues. I can connect by IP, but then I lose the SSL Validity.  That's
> where the context for SSL comes into play.  My main issue is with the IP
> returned for a DNS request changing during the course of a session,
> invalidating it.  I think what it comes down to for me is this statement:
>
> As a user I connect to a drill cluster
>
> A simple statement, but what that means is as a user or an admin, my
> users/code accessing the cluster shouldn't have to care which individual
> node they connect to, they are connecting to a cluster.  This is over
> simplifying things, but session ids managed by the cluster via Zookeeper
> would solve this.
>
>
> 2. I am looking at doing the SSL handling overrides n my python code,
> requests has some handlers for SSL and I was looking to address this,
> however, there is bug in how it works because it drops my custom port
> value... I am working on this now with the python requests folks.  (i.e.
> the custom handlers would work, but only if I was connecting port 443)
>
>
>
>
>
> On Fri, Jun 23, 2017 at 9:52 AM, Keys Botzum <kbot...@mapr.com<mailto:
> kbot...@mapr.com>> wrote:
>
> There is something here I'm not understanding. In the below the hostname
> is always the same so there should be no problem as long as all drillbits
> share a common signer.
>
> I'm also just not following how certificate authentication issues are even
> linked to the Drill session issues. Whether or not there is a Drill
> session, the SSL handshake rules still apply. Or there is something here I
> just don't understand - quite possibly of course. I'm just focused on the
> SSL issue as this I understand very well.
>
> Incidentally, regarding hostname verification, I'm not familiar with what
> controls you have but many libraries (including Java) give you the ability
> to write your own SSL verifier which is called only when the default
> hostname verification fails. In that code you can implement different
> rules. Perhaps you can find a rule that meets your needs (such as a common
> signer for all Drillbits). Remember that certificate hostname validation is
> just a convention. There is nothing about SSL that makes this necessary.
> Here's the Java version: https://docs.oracle.com/
> javase/7/docs/api/javax/net/ssl/HostnameVerifier.html. In case you are
> curious, this is how MapR's maprlogin works with HTTPS even though we use
> IP address by default.
>
> Keys
> _______________________________
> Keys Botzum
> Distinguished Engineer, Field Engineering
> kbot...@maprtech.com<mailto:kbot...@maprtech.com><mailto:kbo
> t...@maprtech.com>
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
>
>
>
> On Jun 23, 2017, at 10:22 AM, John Omernik <j...@omernik.com<mailto:john@
> omernik.com>> wrote:
>
> The wild card certificate isn't a problem on it's own, it's using it in a
> manner that allows me to maintain all of the various features I want.  Let
> me lay this out,
>
> In marathon I have a task, it runs a drill bit.  Since that task is located
> at the node prod/drillprod (for my env it's role/instanceid) the domain
> name is setup to be
>
> drillprod-prod.marathon.mesos.
>
> I can run X number of instances of that task. I tell Marathon to make them
> "host unique" so no two drill bits end up on the same node.  This gives me
> a few things
>
> 1. If choose there to be 3 drillbits running, they go and run, and I don't
> have to worry about them.  If I have to reboot the node one of them is on,
> Marathon says "oh look I am only running 2, let's spin up another, and then
> I get my required 3 bits running automatically.
>
> 2. They use a common config directory located in MapR-FS this is really
> nice because I don't have to maintain separate configurations for each
> drill bit.
>
> 3. The name above, drillprod-prod.marathon.mesos, using nslookup returns
>
> Name: drillprod-prod.marathon.mesos
>
> Address: 192.168.0.105
>
> Name: drillprod-prod.marathon.mesos
>
> Address: 192.168.0.103
>
> Name: drillprod-prod.marathon.mesos
>
> Address: 192.168.0.104
>
>
> Which is desired. When I have a client connect, I can program in a single
> name (drillprod-prod.marathon.mesos) into my script and never have to
> worry
> about where the bits run on the cluster.  It looks it up and works great.
> This has been my standard MO for scripts that do short lived things... I
> haven't had an issue until this new use case came up. (long running
> sessions for use in analytic notebooks is the use case BTW, just not super
> relevant to go into details on that here)
>
>
> Because of the DNS naming, my scripts get tossed around to different bits
> depending on how the DNS round robin provides the IP which is desired for
> various scripts.  The issue comes into play when I make a session
> connection, and for some reason, (maybe after a cache time out or
> something) python's requests object makes the next request, but does a DNS
> lookup first causing the IP to change, and the session to invalidate. Not
> awesome when working in a notebook.
>
> The wildcard DNS "could" work, but there are some gotchas... I could create
> an application folder in marathon with the same name, prod/drillprod, and
> then in there I could create a task with the hostname for each host.
>
> However, this would then make me loose on the HAness of my setup. If I am
> trying to run 3 instances of bits, on nodes node104, node103, and node105,
> and I need to reboot node105, in my setup, node102 could get the new bit
> the dns name auto updates and I maintain HA with simplicity, however, with
> a wildcard cert, I would still need to manually spin up a new instance to
> maintain three instances.
>
> In addition, I would have to get a list of the three nodes running to pick
> one to connect to.  Lots of complex orchestration to use wild card certs to
> maintain HA.
>
> The reverse proxy will work for me, I can program nginx to pin
> connections.  Thus, I can have it base which backend it goes to based on
> the JSESSIONID, that should work, but I don't like it because it requires
> another component running in my network, not bad for me, I can easily run
> that on Zeta that won't be an issue at all, but as a whole, it's not ideal
> for Drill users.
>
> Thus I am back to the idea of Drill somehow maintaining a global state.
> This is also important for Drill on Yarn setups (unless there is some sort
> of application container proxy back to the bits). If you want to have
> security (SSL) with hostnames, the session maintenance must be addressed.
>
> So that's why I toss it out here... this is a desirable feature I would
> imagine, even if people are not asking for it now, it may not because they
> don't need it, but in their testing of Drill, and how they using it now, it
> may not come up... when they have multiple people and services hitting
> drill end points pointing them individual nodes for SSL management etc,
> becomes a nightmare... thus, as a thought exercise, could be securely
> maintain valid session ideas in Zookeeper for nodes to check on? What would
> an ideal setup for something like that be?
>
>
>
>
> On Fri, Jun 23, 2017 at 7:07 AM, Keys Botzum <kbot...@mapr.com<mailto:
> kbot...@mapr.com>> wrote:
>
> Why is a wildcard certificate a problem? They are quite common. One just
> needs all of the Drillbits to share a common domain for the wildcard to be
> easy and thus avoid having to list individual hosts.
>
> Are you saying that you can't use hostnames and must use IPs?
>
> In case I'm not clear, here's an example of what I'm saying.
>
> this is good with wildcards: drill1.mydrill.corp.com<http:/
> /drill1.mydrill.corp.com><http:/
> /drill1.mydrill.corp.com<http://drill1.mydrill.corp.com>>,
> drill2.mydrill.corp.com<http://drill2.mydrill.corp.com><http:/
> /drill2.mydrill.corp.com<http://drill2.mydrill.corp.com>>,
> drill3.mydrill.corp.com<http://drill3.mydrill.corp.com><http:/
> /drill3.mydrill.corp.com<http://drill3.mydrill.corp.com>>,
> drill4.mydrill.corp.com<http://drill4.mydrill.corp.com><http:/
> /drill4.mydrill.corp.com<http://drill4.mydrill.corp.com>>,
> this is bad with wildcards: drill1, drill2, drill3, drill4
>
>
> Keys
> _______________________________
> Keys Botzum
> MapR Technologies
>
>
>
> On Jun 22, 2017, at 8:24 PM, John Omernik <j...@omernik.com<mailto:john@
> omernik.com><mailto:john@
> omernik.com<http://omernik.com>>> wrote:
>
> Would there be interest in finding a way to globalize this? This is
> challenging for me and others that may run drill with multi Tennant
> orchestrators.  In my particular setup, each node running drill gets added
> to an a record automatically giving me HA and distribution of Rest API
> queries.  It also allows me to have a single certificate for my cluster
> rather than managing certificates on a individual basis.   I set things up
> to connect via IP but then I had certificate mismatch warnings. My goal is
> to find a way to connect to the rest API , while maintaining a session to
> single node, with out sacrificing HA and balancing and with compromising
> ssl security.   I know it's a tall order, but if there I ideas outside of a
> global state management I am all ears.
>
> Note some ideas I've also considered:
>
> 1.  using a load balancer that would allow me to pin connections.  Not
> ideal because it's another service to manage but it would work.
>
> 2. There may be a way to hack things with a wild card cert but it's seems
> complicated and fragile.
>
> On Jun 22, 2017 5:47 PM, "Sorabh Hamirwasia" <shamirwa...@mapr.com<mailto:
> shamirwa...@mapr.com><mailto:
> shamirwa...@mapr.com<mailto:shamirwa...@mapr.com>>> wrote:
>
> Hi John,
> As Paul mentioned session ID's are not global. Each session is part of the
> BitToUserConnection instance created for a connection between Drillbit and
> client. Hence it's local to that Drillbit only and the lifetime of the
> session is tied to lifetime of the connection. You can find the code here<
> https://github.com/apache/drill/blob/master/exec/
> java-exec/src/main/java/org/apache/drill/exec/rpc/user/
> UserServer.java#L102>.
>
> Thanks,
> Sorabh
>
> ________________________________
> From: Paul Rogers <prog...@mapr.com>
> Sent: Thursday, June 22, 2017 2:19:50 PM
> To: user@drill.apache.org
> Subject: Re: Drill Session ID between Nodes
>
> Hi John,
>
> I do not believe that session IDs are global. Each Drillbit maintains its
> own concept of sessions. A global session would require some centralized
> registry of sessions, which Drill does not have.
>
> Would be great if someone can confirm…
>
> - Paul
>
> On Jun 22, 2017, at 12:14 PM, John Omernik <j...@omernik.com> wrote:
>
> When I log onto a drill node, and get Session Id, if I connect to another
> drill node in the cluster will the session id be valid?
>
> I am guessing not, but want to validate.
>
> My conumdrum, I have my Drill cluster running in such a way that the
> connections to the nodes are load balanced via DNS. However, if I get a
> DNS
> IP while in session it appears to invalidate, and thus forces me to log
> on...
>
>
>
>
>
>
>
>

Reply via email to