Hi Jeff - Thanks for reaching out!
Rather than try and unpack all of that, I'd like to get to step back to a description of what you are trying to accomplish with your deployment and the addition of Knox within it. As you have described it, it seems like a very unsecured environment. Whether you are running your process as a root user or not, executing your queries and operations as the HDFS user is also very insecure. HDFS is a superuser in a Hadoop deployment. Authenticating to Knox as root and asserting the effective user as hdfs is certainly we can do but I don't see what the value is of doing that. So, let's step back and get a clear picture of what you would like to accomplish and we can direct you to appropriate authentication/federation providers and possibly identity-assertion providers to meet your needs. thanks, --larry On Mon, Nov 18, 2019 at 2:47 PM Kevin Risden <[email protected]> wrote: > If i am to do an hdfs query, all i need to do is to set HADOOP_USER_NAME >> to 'hdfs' then everything works nicely. > > > This means that you aren't using Kerberos just regular simple auth for > your cluster. > > This is true until we get to knox. We still communicate with Knox using a >> root and an admin password. I believe by default, this user's identity is >> used to call webhdfs? >> > > The user identity is asserted by Knox against the backend service. So Knox > is configured for authentication that username is asserted to the backend. > So however you are doing authentication in Knox needs to be configured. > This is usually LDAP out of the box but can be configured with different > authentication providers like PAM. > > Kevin Risden > > > On Mon, Nov 18, 2019 at 2:37 PM jeff saremi <[email protected]> > wrote: > >> I'm not sure how to phrase this question and also I don't have any >> experience in these two technologies >> >> Here's the deal: We are switching from running hadoop and related >> technologies from under root to a non-root user >> >> So far we have managed to successfully change our namenodes and datanodes >> such that the process is running under a user named 'hdfs'. >> >> If i am to do an hdfs query, all i need to do is to set HADOOP_USER_NAME >> to 'hdfs' then everything works nicely. >> >> This is true until we get to knox. We still communicate with Knox using a >> root and an admin password. I believe by default, this user's identity is >> used to call webhdfs? >> >> We need to change this behavior. Looking for some pointers on what the >> changes would be. >> >> thanks >> Jeff >> >
