[ https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15736764#comment-15736764 ]
Clay B. commented on YARN-5910: ------------------------------- This conversation has been very educational for me; thank you! I am concerned still that if we do not use kerberos, the requesting user will have no way to renew tokens as themselves? If we can not authenticate as the user, won't we be unable to work when the administrators of two clusters may be different (and thus not have the same {{yarn}} user setup -- e.g. two different principals in kerberos). Can we find a solution to that issue here as well (or ensure that this issue doesn't preclude that issue)? I really like the idea that the client (human client) is responsible for specifying the resources needed, as again in a highly federated Hadoop environment, one administration group may not even know of all clusters and this allows for more agile cross-cluster usage. I see there are two issues here I was hoping to solve: 1. A remote cluster's services are needed (e.g. as a data source to this job) 2. A remote cluster does not trust this cluster's YARN principal [~jlowe] brings up some good questions and points which hit this well: {quote}I'm not sure distributing the keytab is going to be considered a reasonable thing to do in some setups. Part of the point of getting a token is to avoid needing to ship a keytab everywhere. Once we have a keytab, is there a need to have a token?{quote} If the YARN principals of each cluster are different but the user is entitled to services on both clusters is there another way around this issue? Further, while I think many shops may have the kerberos tooling to avoid shipping keytabs, some shops are heavily HBase (e.g. long running query services) dependent or streaming centric (jobs last longer than maximal token refresh periods) and thus have to use keytabs today. {quote}There's also the problem of needing to renew the token while the AM is waiting to get scheduled if the cluster is really busy. If the AM isn't running it can't renew the token. I would expect the remote-cluster resources to not be central to operating the job. E.g. we would use the local cluster for HDFS and YARN but might want to access a remote cluster's YARN. If the AM can request tokens (i.e. with a keytab or proxy kerberos credential which was refreshed by the RM) then we can request new tokens when the job is scheduled if it was hung-up longer than the renewal time; further we do not worry about exploits of custom configuration running as a privileged process but something running as a user. Regardless, are there many clusters folks see today where the scheduling time is longer than the renewal time of a delegation token? (I.e. that would be by-default one seventh of the total job's maximal runtime -- longer than a day?) {quote}My preference is to have the token be as self-descriptive as we can possibly get. Doing the ApplicationSubmissionContext thing could work for the HA case, but I could see this being a potentially non-trivial payload the RM has to bear for each app (configs can get quite large). It'd rather avoid adding that to the context for this purpose if we can do so, but if the token cannot be self-descriptive in all cases then we may not have much other choice that I can see.{quote} I agree this seems to be the sanest idea for how to get the configuration in; we could also perhaps extend the various delegation token types to only optionally include this payload? Then we the RM would only pay the price when needed for an off-cluster request? > Support for multi-cluster delegation tokens > ------------------------------------------- > > Key: YARN-5910 > URL: https://issues.apache.org/jira/browse/YARN-5910 > Project: Hadoop YARN > Issue Type: New Feature > Components: security > Reporter: Clay B. > Priority: Minor > > As an administrator running many secure (kerberized) clusters, some which > have peer clusters managed by other teams, I am looking for a way to run jobs > which may require services running on other clusters. Particular cases where > this rears itself are running something as core as a distcp between two > kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp > hdfs://LOCALCLUSTER/user/user292/test.out > hdfs://REMOTECLUSTER/user/user292/test.out.result}}). > Thanks to YARN-3021, once can run for a while but if the delegation token for > the remote cluster needs renewal the job will fail[1]. One can pre-configure > their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes > available but that requires coordination that is not always feasible, > especially as a cluster's peers grow into the tens of clusters or across > management teams. Ideally, one could have core systems configured this way > but jobs could also specify their own handling of tokens and management when > needed? > [1]: Example stack trace when the RM is unaware of a remote service: > ---------------- > {code} > 2016-03-23 14:59:50,528 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > application_1458441356031_3317 found existing hdfs token Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: > (HDFS_DELEGATION_TOKEN token > 10927 for user292) > 2016-03-23 14:59:50,557 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer. > java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for > user292) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Unable to map logical nameservice URI > 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a > failover proxy provider configured. > at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164) > at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425) > ... 6 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org