@larry<mailto:lmc...@apache.org> Thanks a lot. That was exactly what we were 
looking for.
________________________________
From: jeff saremi <jeffsar...@hotmail.com>
Sent: Monday, November 18, 2019 2:05 PM
To: larry mccay <lmc...@apache.org>; user@knox.apache.org <user@knox.apache.org>
Subject: Re: Switching user going from KNOX to WebHDFS

@Larry
thanks for the explanation. it explains the behavior i was seeing.
I'll look into the link you sent.

________________________________
From: larry mccay <lmc...@apache.org>
Sent: Monday, November 18, 2019 1:52 PM
To: user@knox.apache.org <user@knox.apache.org>
Subject: Re: Switching user going from KNOX to WebHDFS

Knox will scrub any incoming query params that are used to potentially spoof 
other users.
Both user.name<http://user.name> and doas/doAs will be scrubbed from the 
incoming request and the appropriate query param will be used for the dispatch 
to the backend service.

You can look at the default identity-assertion [1] provider principal mapping 
capabilities to map an authenticated principal/username to another user known 
within the cluster. This will allow you to authenticate as one user and the 
effective user within the cluster will be the mapped user.

1. 
http://knox.apache.org/books/knox-1-3-0/user-guide.html#Default+Identity+Assertion+Provider

On Mon, Nov 18, 2019 at 4:15 PM jeff saremi 
<jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> wrote:
@Larry, yes your description is pretty well.
We have different "levels" of security. This is the lowest one and most 
unsecure however since this is on-prem product some customers woulnd't mind 
accepting the risk

We also have Kerberos and AD on top of these for those want a more secure 
environment.

Currently, we need Knox to allow us pass the superuser identity if possible.
thanks
________________________________
From: jeff saremi <jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>>
Sent: Monday, November 18, 2019 1:12 PM
To: larry mccay <lmc...@apache.org<mailto:lmc...@apache.org>>; 
user@knox.apache.org<mailto:user@knox.apache.org> 
<user@knox.apache.org<mailto:user@knox.apache.org>>
Subject: Re: Switching user going from KNOX to WebHDFS

@kevin, yes we're not using Kerberos or any AD

So you're saying that whatever user I authenticate against knox is the one that 
will be passed to webhdfs?

If i were to pass ?user.name<http://user.name>=hdfs in the query string 
targeted for hdfs or ?doas=hdfs in the that request, would knox honor those and 
pass them along? or will they get overwritten by Knox?
________________________________
From: larry mccay <lmc...@apache.org<mailto:lmc...@apache.org>>
Sent: Monday, November 18, 2019 1:09 PM
To: user@knox.apache.org<mailto:user@knox.apache.org> 
<user@knox.apache.org<mailto:user@knox.apache.org>>
Subject: Re: Switching user going from KNOX to WebHDFS

Hi Jeff -

Thanks for reaching out!

Rather than try and unpack all of that, I'd like to get to step back to a 
description of what you are trying to accomplish with your deployment and the 
addition of Knox within it.

As you have described it, it seems like a very unsecured environment.
Whether you are running your process as a root user or not, executing your 
queries and operations as the HDFS user is also very insecure.
HDFS is a superuser in a Hadoop deployment.

Authenticating to Knox as root and asserting the effective user as hdfs is 
certainly we can do but I don't see what the value is of doing that.

So, let's step back and get a clear picture of what you would like to 
accomplish and we can direct you to appropriate authentication/federation 
providers and possibly identity-assertion providers to meet your needs.

thanks,

--larry

On Mon, Nov 18, 2019 at 2:47 PM Kevin Risden 
<kris...@apache.org<mailto:kris...@apache.org>> wrote:
If i am to do an hdfs query, all i need to do is to set HADOOP_USER_NAME to 
'hdfs' then everything works nicely.

This means that you aren't using Kerberos just regular simple auth for your 
cluster.

This is true until we get to knox. We still communicate with Knox using a root 
and an admin password. I believe by default, this user's identity is used to 
call webhdfs?

The user identity is asserted by Knox against the backend service. So Knox is 
configured for authentication that username is asserted to the backend. So 
however you are doing authentication in Knox needs to be configured. This is 
usually LDAP out of the box but can be configured with different authentication 
providers like PAM.

Kevin Risden


On Mon, Nov 18, 2019 at 2:37 PM jeff saremi 
<jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> wrote:
I'm not sure how to phrase this question and also I don't have any experience 
in these two technologies

Here's the deal: We are switching from running hadoop and related technologies 
from under root to a non-root user

So far we have managed to successfully change our namenodes and datanodes such 
that the process is running under a user named 'hdfs'.

If i am to do an hdfs query, all i need to do is to set HADOOP_USER_NAME to 
'hdfs' then everything works nicely.

This is true until we get to knox. We still communicate with Knox using a root 
and an admin password. I believe by default, this user's identity is used to 
call webhdfs?

We need to change this behavior. Looking for some pointers on what the changes 
would be.

thanks
Jeff

Reply via email to