[ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736764#comment-15736764
 ] 

Clay B. commented on YARN-5910:
-------------------------------

This conversation has been very educational for me; thank you! I am concerned 
still that if we do not use kerberos, the requesting user will have no way to 
renew tokens as themselves? If we can not authenticate as the user, won't we be 
unable to work when the administrators of two clusters may be different (and 
thus not have the same {{yarn}} user setup -- e.g. two different principals in 
kerberos). Can we find a solution to that issue here as well (or ensure that 
this issue doesn't preclude that issue)?

I really like the idea that the client (human client) is responsible for 
specifying the resources needed, as again in a highly federated Hadoop 
environment, one administration group may not even know of all clusters and 
this allows for more agile cross-cluster usage.

I see there are two issues here I was hoping to solve:
1. A remote cluster's services are needed (e.g. as a data source to this job)
2. A remote cluster does not trust this cluster's YARN principal

[~jlowe] brings up some good questions and points which hit this well:
{quote}I'm not sure distributing the keytab is going to be considered a 
reasonable thing to do in some setups. Part of the point of getting a token is 
to avoid needing to ship a keytab everywhere. Once we have a keytab, is there a 
need to have a token?{quote}

If the YARN principals of each cluster are different but the user is entitled 
to services on both clusters is there another way around this issue? Further, 
while I think many shops may have the kerberos tooling to avoid shipping 
keytabs, some shops are heavily HBase (e.g. long running query services) 
dependent or streaming centric (jobs last longer than maximal token refresh 
periods) and thus have to use keytabs today.

{quote}There's also the problem of needing to renew the token while the AM is 
waiting to get scheduled if the cluster is really busy. If the AM isn't running 
it can't renew the token.

I would expect the remote-cluster resources to not be central to operating the 
job. E.g. we would use the local cluster for HDFS and YARN but might want to 
access a remote cluster's YARN. If the AM can request tokens (i.e. with a 
keytab or proxy kerberos credential which was refreshed by the RM) then we can 
request new tokens when the job is scheduled if it was hung-up longer than the 
renewal time; further we do not worry about exploits of custom configuration 
running as a privileged process but something running as a user.

Regardless, are there many clusters folks see today where the scheduling time 
is longer than the renewal time of a delegation token? (I.e. that would be 
by-default one seventh of the total job's maximal runtime -- longer than a day?)

{quote}My preference is to have the token be as self-descriptive as we can 
possibly get. Doing the ApplicationSubmissionContext thing could work for the 
HA case, but I could see this being a potentially non-trivial payload the RM 
has to bear for each app (configs can get quite large). It'd rather avoid 
adding that to the context for this purpose if we can do so, but if the token 
cannot be self-descriptive in all cases then we may not have much other choice 
that I can see.{quote}

I agree this seems to be the sanest idea for how to get the configuration in; 
we could also perhaps extend the various delegation token types to only 
optionally include this payload? Then we the RM would only pay the price when 
needed for an off-cluster request?

> Support for multi-cluster delegation tokens
> -------------------------------------------
>
>                 Key: YARN-5910
>                 URL: https://issues.apache.org/jira/browse/YARN-5910
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: security
>            Reporter: Clay B.
>            Priority: Minor
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> ----------------
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to