[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens

Jian He (JIRA) Wed, 18 Jan 2017 11:45:13 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828654#comment-15828654
 ]


Jian He edited comment on YARN-5910 at 1/18/17 7:43 PM:
--------------------------------------------------------

Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet 
it clearly is not the default value.
removed it

bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm 
wondering if this can cause issues where the current cluster isn't secure but 
the RM needs to renew the job's tokens for a remote secure cluster or some 
other secure service. Seems like if this conf is set then that's all we need to 
know.
Currently, the RM DelegationTokenRewener will only add the tokens if security 
is enabled  (code in RMAppManager#submitApplication), so I think with this 
existing implemtation, we can assume this feature is for security enabled only ?

bq. Similarly the code explicitly fails in ClientRMService if the conf is there 
when security is disabled which seems like we're taking a case that isn't 
optimal but should work benignly and explicitly making sure it fails. Not sure 
that's user friendly behavior.
My intention was to prevent user from sending conf in non-secure mode(which 
anyways is not needed if my above reply is true), in case the conf size huge 
which may increase load on RM. On ther other hand, Varun chatted offline that 
we can add a limit config in RM to limit the size of configs, your opinion ?

bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the 
rewind method calls seem unnecessary since we're throwing the buffers away 
immediately afterwards.
Actually, the bytebuffer is a direct reference from the containerLaunchContext, 
not a copy. I think this is also required because it was specifically to solve 
issues in YARN-2893.

bq. Should the Configuration constructor call in parseTokensConf be using the 
version that does not load defaults? If not then I recommend we at least allow 
a conf to be passed in to use as a copy constructor.Loading a new Configuration 
from scratch is really expensive and we should avoid it if possible. See the 
discussion on HADOOP-11223 for details.
Good point. I actually did the same in YarnRunner#setAppConf method, but missed 
this place.

bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when 
renewing the tokens? 
I wasn't sure whether the mere appConf is enough for the connection - (Is there 
any kerberos related configs for RM itself are required for authentication?). 
Let me do some experiments, if this works, I'll just use appConf. 
bq. Also it looks like we're polluting subsequent app-conf renewals with prior 
app configurations, as well as simply leaking appConf objects as renewerConf 
resources infinitum. I don't see where renewerConf gets reset in-between.
My previous patch made a copy of each appConf and merge with RM's conf(for the 
reason I wasn't sure whether RM's own conf is required) and use that for 
renwer. But then I think this maybe bad because every app will have its own 
copy of configs, which may largely increase the memory size if the number of 
apps is very big. So, in the latest patch I changed it to let all apps share 
the same renewerConf - this is based on the assumption that "dfs.nameservices" 
must have distint keys for each distinct cluster, so we won't have situation 
where two apps use different configs for the same cluster - it is true that 
unnecessary configs used by 1st app will be shared by subsequent apps.  

bq. Arguably there should be a unit tests that verifies a first app with token 
conf key A and a second app with token conf key B doesn't leave a situation 
where the renewals of the second app are polluted with conf key A. 
If the mere appConf works, we should be fine.

bq. Speaking of unit tests, I see where we fixed up the YARN unit tests to pass 
the new conf but not a new test that verifies the specified conf is used 
appropriately when renewing for that app and not for other apps that didn't 
specify a conf.
Yep, I'll add the UT.


was (Author: jianhe):
Hi Jason, thank you very much for the review !
bq. It's confusing to see a MR_JOB_SEND_TOKEN_CONF_DEFAULT in MRJobConfig yet 
it clearly is not the default value.
removed it

bq. Should this feature be tied to UserGroupInformation.isSecurityEnabled? I'm 
wondering if this can cause issues where the current cluster isn't secure but 
the RM needs to renew the job's tokens for a remote secure cluster or some 
other secure service. Seems like if this conf is set then that's all we need to 
know.
Currently, the RM DelegationTokenRewener will only add the tokens if security 
is enabled  (code in RMAppManager#submitApplication), so I think with this 
existing implemtation, we can assume this feature is for security enabled only ?

bq. Similarly the code explicitly fails in ClientRMService if the conf is there 
when security is disabled which seems like we're taking a case that isn't 
optimal but should work benignly and explicitly making sure it fails. Not sure 
that's user friendly behavior.
My intention was to prevent user from sending conf in non-secure mode(which 
anyways is not needed if my above reply is true), in case the conf size huge 
which may increase load on RM. On ther other hand, Varun chatted offline that 
we can add a limit config in RM to limit the size of configs, your opinion ?

bq. Nit: For the ByteBuffer usage in parseCredentials and parseTokensConf, the 
rewind method calls seem unnecessary since we're throwing the buffers away 
immediately afterwards.
Actually, the bytebuffer is a direct reference from the containerLaunchContext, 
not a copy. I think this is also required because it was specifically to solve 
issues in YARN-2893.

bq. Should the Configuration constructor call in parseTokensConf be using the 
version that does not load defaults? If not then I recommend we at least allow 
a conf to be passed in to use as a copy constructor.Loading a new Configuration 
from scratch is really expensive and we should avoid it if possible. See the 
discussion on HADOOP-11223 for details.
Good point. I actually did the same in YarnRunner#setAppConf method, but missed 
this place.

bq. In DelegationTokenRenewer, why aren't we using the appConf as-is when 
renewing the tokens? 
I wasn't sure whether the mere appConf is enough for the connection - (Is there 
any kerberos related configs for RM itself are required for authentication?). 
Let me do some experiments, if this works, I'll just use appConf. 
bq. Also it looks like we're polluting subsequent app-conf renewals with prior 
app configurations, as well as simply leaking appConf objects as renewerConf 
resources infinitum. I don't see where renewerConf gets reset in-between.
My previous patch made a copy of each appConf and merge with RM's conf(for the 
reason I wasn't sure whether RM's own conf is required) and use that for 
renwer. But then I think this maybe bad because every app will have its own 
copy of configs, which may largely increase the memory size if the number of 
apps is very big. So, in the latest patch I changed it to let all apps share 
the same renewerConf - this is based on the assumption that "dfs.nameservices" 
must have distint keys for each distinct cluster, so we won't have situation 
where two apps use different configs for the same cluster - it is true that 
unnecessary configs used by 1st app will be shared by subsequent apps.  

bq. Arguably there should be a unit tests that verifies a first app with token 
conf key A and a second app with token conf key B doesn't leave a situation 
where the renewals of the second app are polluted with conf key A. 
If the mere appConf works, we should be fine.

Speaking of unit tests, I see where we fixed up the YARN unit tests to pass the 
new conf but not a new test that verifies the specified conf is used 
appropriately when renewing for that app and not for other apps that didn't 
specify a conf.
Yep, I'll add the UT.

> Support for multi-cluster delegation tokens
> -------------------------------------------
>
>                 Key: YARN-5910
>                 URL: https://issues.apache.org/jira/browse/YARN-5910
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: security
>            Reporter: Clay B.
>            Assignee: Jian He
>            Priority: Minor
>         Attachments: YARN-5910.01.patch, YARN-5910.2.patch, 
> YARN-5910.3.patch, YARN-5910.4.patch
>
>
> As an administrator running many secure (kerberized) clusters, some which 
> have peer clusters managed by other teams, I am looking for a way to run jobs 
> which may require services running on other clusters. Particular cases where 
> this rears itself are running something as core as a distcp between two 
> kerberized clusters (e.g. {{hadoop --config /home/user292/conf/ distcp 
> hdfs://LOCALCLUSTER/user/user292/test.out 
> hdfs://REMOTECLUSTER/user/user292/test.out.result}}).
> Thanks to YARN-3021, once can run for a while but if the delegation token for 
> the remote cluster needs renewal the job will fail[1]. One can pre-configure 
> their {{hdfs-site.xml}} loaded by the YARN RM to know of all possible HDFSes 
> available but that requires coordination that is not always feasible, 
> especially as a cluster's peers grow into the tens of clusters or across 
> management teams. Ideally, one could have core systems configured this way 
> but jobs could also specify their own handling of tokens and management when 
> needed?
> [1]: Example stack trace when the RM is unaware of a remote service:
> ----------------
> {code}
> 2016-03-23 14:59:50,528 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  application_1458441356031_3317 found existing hdfs token Kind: 
> HDFS_DELEGATION_TOKEN, Service: ha-hdfs:REMOTECLUSTER, Ident: 
> (HDFS_DELEGATION_TOKEN token
>  10927 for user292)
> 2016-03-23 14:59:50,557 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:REMOTECLUSTER, Ident: (HDFS_DELEGATION_TOKEN token 10927 for 
> user292)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:781)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:762)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Unable to map logical nameservice URI 
> 'hdfs://REMOTECLUSTER' to a NameNode. Local configuration does not have a 
> failover proxy provider configured.
> at org.apache.hadoop.hdfs.DFSClient$Renewer.getNNProxy(DFSClient.java:1164)
> at org.apache.hadoop.hdfs.DFSClient$Renewer.renew(DFSClient.java:1128)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:516)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:513)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:511)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:425)
> ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5910) Support for multi-cluster delegation tokens

Reply via email to