Thanks again Larry. One comment online.
>We should be able to provide access to the same resources via proxied calls to
>the backend services that you require without leaking a credential that can be
>captured and replayed by anyone to spoof the original user.
We are exactly looking something like this. Is there a way to support the
proxying of RPC communications through Knox. For example, how can we run a
traditional Spark job in Mesos cluster through Knox only for HDFS access? How
can we use Hive java (thrift) API to go through Knox? I know Knox supports
HTTP/JDBC. However, not sure how to support the RPC/TCP communication.
On Thursday, October 19, 2017, 6:20:34 AM PDT, larry mccay
<[email protected]> wrote:
Thank you for the clarification.I did understand correctly.
I think my characterization of this making Knox a delegation token factory was
slightly off - we would be more of a delegation token broker. Which is just as
inappropriate considering what I consider the charter of the Knox Gateway.
If we are actually returning the delegation token to the client with webhdfs
calls through knox then something is broken and we need to address that.
While your assertion that the token is returned by the services through REST
and Thrift calls is certainly true, this behavior is generally done within the
cluster firewalls or controlled access through the firewall for trusted
users/clients that would otherwise be able to authenticate via kerberos anyway.
It is a means to offload the traffic from the KDC when there are thousands of
datanodes and/or users.
In deployments of Knox where users are authenticating without the need for
kerberos or the more importantly, the restrictions of kerberos authentication
from where they are, there is no reason to leak the delegation token to the end
users.
We should be able to provide access to the same resources via proxied calls to
the backend services that you require without leaking a credential that can be
captured and replayed by anyone to spoof the original user.
On Thu, Oct 19, 2017 at 3:26 AM, Mohammad Islam <[email protected]> wrote:
Hi Larry,thanks for your reply.
I believe I didn't explain our use-case properly. Let me give some contexts
and addressing some concerns.
Be warned - a long email :)
We restrict the Kerberos access only within Hadoop cluster. Any access to
Kerberos service from outside Hadoop is not recommended for different reasons.
However our user wants to access HDFS/Hive/YARN etc.
Background (that you already know):
As far as I know, Hadoop provides two types of security for user applications.
A) Kerberos ticket based : In this case, during job submission, client
implicitly gets the HDFS & RM Tokens while presenting her kerberos ticket.
B) Delegation token based: In this case, the user "somehow" needs to get the
delegation token from HDFS/YARN/Hive services. Then put the tokens into a local
file and then expose the file's path with an environment variable called
"HADOOP_TOKEN_FILE_LOCATIONS". After that, if the user submits any application
w/o Kerberos ticket, it will able to connect to those services using delegation
token. Oozie/Azkaban utilizes something similar like this.
In our environment, we want to use the second option. User doesn't need the
Kerberos ticket. She only needs the delegation token. The question is how she
can get the delegation token. Most common approach is to use kerberos ticket
and call appropriate REST or Thrift call to get the token.
However, our proposal is : if Knox can provide the service where external user
will first authenticate to Knox through LDAP or some other means and get the
delegation token collected from actual services. In other words, we want to get
the delegation token w/o "directly" using kerberos ticket. Knox can be an
intermediary who can authenticate the user by non-kerberos way and then utilize
its Kerberos credential to call the appropriate services (i.e. HDFS/YARN/Hive)
and gets the delegation tokens from them and , finally, returns the tokens to
the user.
Addressing Concerns: Concern 1 : Regarding security compromise: All other
services (WebHDFS, YARN, Hive) are already exposing their delegation tokens to
user through REST/Thrift/Java API. Only change is using non-Kerberos ways via
Knox. Knox is just proxying.
Concern 2: Knox be the factory of delegation token: I'm not asking Knox to
manage/create delegation token for the services. Rather Knox will gather it
from appropriate services and return to the caller. Btw, we can currently get
the HDFS delegation using WebHDFS curl command through Knox w/o Kerberos
ticket. Asking to extend this for other Hadoop services through Knox framework.
Concern 3: "return a delegation token without proxying a call to a backend
service ", I think we don't want w/o proxying. we don't want Knox to
manage/create any delegation token. Rather forward the request to appropriate
service with Knox kerberos ticket and collect then return.
Please let me know if you need more clarifications.
Regards,Mohammad
On Wednesday, October 18, 2017, 7:24:23 AM PDT, larry mccay
<[email protected]> wrote:
Hi Jérôme -
Thanks for that heads up.
We do actually have kerberos support through the Hadoop Auth Provider
already which does incorporate support for accepting the Hadoop specific
delegation tokens.
If I understand the ask properly here, it is for Knox to request the Hadoop
specific delegation token that can be used directly without having to use
kerberos to get it.
My feeling is that acting as such a factory for sensitive credentials would
not be in the interest of the Knox project.
But as I said, I may be able to be convinced otherwise.
thanks again!
--larry
On Wed, Oct 18, 2017 at 2:46 AM, Jérôme LELEU <[email protected]> wrote:
> Hi,
>
> I just saw "Kerberos" somewhere in the discussion. I just wanted to quickly
> let you know that pac4j 2.1 supports Kerberos so things may be
> straight-forward after the pac4j upgrade.
> Thanks.
> Best regards,
> Jérôme
>
>
> On Tue, Oct 17, 2017 at 1:37 PM, larry mccay <[email protected]> wrote:
>
> > Hi Mohammad -
> >
> > I need to better understand your usecase.
> >
> > It seems that you would like Knox to provide a delegation token factory
> > type role where a service/user can authenticate to knox against LDAP or
> > some other provider and return a delegation token without proxying a call
> > to a backend service. So, something like a DelegationToken service
> instead
> > of KnoxToken service.
> >
> > Considering that we go to some trouble to not leak the delegation token
> to
> > clients outside of the gateway, this seems like it is at odds with some
> of
> > the goals of what the gateway is trying to do. It will also likely
> require
> > RPC calls from the jersey service to the backend services to request the
> > delegation token. This will add a compile time dependency on those
> > libraries which we currently don't have.
> >
> > I do assume that the trusted proxy model in Hadoop does allow us to play
> > that role, however, given the above I don't know that it is a great
> usecase
> > for Knox. I could possibly be persuaded otherwise.
> >
> > I'd like to hear the full usecase and how the use of the delegation
> tokens
> > and managing their expiration without kerberos will ultimately be better
> > than just leveraging the Hadoop common libraries and kerberos directly or
> > consuming the same cluster resources with proxied calls through Knox.
> >
> > thanks,
> >
> > --larry
> >
> >
> > On Tue, Oct 17, 2017 at 3:28 AM, Mohammad Islam <[email protected]>
> > wrote:
> >
> > > Hi,
> > > We have a use case where non-Hadoop services want to utilize delegation
> > > token instead of direct Kerberos ticket. Therefore, I'm wandering if
> > Knox
> > > can support this service where Knox can get delegation tokens from
> Hadoop
> > > services such as HDFS, YARN, Hive, HBase etc.
> > > This will allow the non-Hadoop services to connect to Hadoop services
> w/o
> > > Kerberos ticket.
> > >
> > > Does Knox support this in any way/form? Otherwise, would it be a good
> > idea
> > > to support this?
> > >
> > > Regards,
> > > Mohammad
> > >
> > >
> >
>