[ 
https://issues.apache.org/jira/browse/YARN-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261412#comment-16261412
 ] 

Istvan Vajnorak commented on YARN-7550:
---------------------------------------

Please note that a similar DNS / IP resolution issue has been fixed for 
Zookeeper so clients can still be created should a network resolution hick-up 
happen and persist for a shorter amount of time under ZOOKEEPER-1576.

> Allow YARN HA to be fault tolerant on missing DNS entries
> ---------------------------------------------------------
>
>                 Key: YARN-7550
>                 URL: https://issues.apache.org/jira/browse/YARN-7550
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 2.6.5
>            Reporter: Istvan Vajnorak
>
> Should for some reason from the DNS registry one of the ResourceManager 
> host's would be missing, the HA configuration of the ClientProxy is not fault 
> tolerant enough to survive this.
> To ensure that even in the face of DNS resolution issues, when at least one 
> of the RMs can be resolved, then allow the tokenService call to succeed. This 
> can be seen at: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L153
> We can safely assume if one of the RMs is missing from DNS, they can't be the 
> active one anyways, so clients jobs can still be submitted while people fix 
> the DNS issues.
> A sample exception when one of the entries are missing:
> {code}
> 17/11/02 18:20:35 INFO service.AbstractService: Service 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state 
> STARTED; cause: java.lang.IllegalArgumentException: 
> java.net.UnknownHostException: some.dns.entry
> at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374)
>  
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.getTokenService(ClientRMProxy.java:153)
>  
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.getAMRMTokenService(ClientRMProxy.java:138)
>  
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.setAMRMTokenService(ClientRMProxy.java:80)
>  
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:99)
>  
> at 
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal(ConfiguredRMFailoverProxyProvider.java:76)
>  
> at 
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxy(ConfiguredRMFailoverProxyProvider.java:90)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:75)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:66)
>  
> at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) 
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:95) 
> at 
> org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
>  
> at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
>  
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) 
> at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:65) 
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:359)
>  
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:435)
>  
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256)
>  
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:774)
>  
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) 
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
>  
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:772)
>  
> at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:795)
>  
> at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 
> Caused by: java.net.UnknownHostException: some.dns.entry 
> ... 28 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to