[ https://issues.apache.org/jira/browse/YARN-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261412#comment-16261412 ]
Istvan Vajnorak commented on YARN-7550: --------------------------------------- Please note that a similar DNS / IP resolution issue has been fixed for Zookeeper so clients can still be created should a network resolution hick-up happen and persist for a shorter amount of time under ZOOKEEPER-1576. > Allow YARN HA to be fault tolerant on missing DNS entries > --------------------------------------------------------- > > Key: YARN-7550 > URL: https://issues.apache.org/jira/browse/YARN-7550 > Project: Hadoop YARN > Issue Type: Improvement > Components: client > Affects Versions: 2.6.5 > Reporter: Istvan Vajnorak > > Should for some reason from the DNS registry one of the ResourceManager > host's would be missing, the HA configuration of the ClientProxy is not fault > tolerant enough to survive this. > To ensure that even in the face of DNS resolution issues, when at least one > of the RMs can be resolved, then allow the tokenService call to succeed. This > can be seen at: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L153 > We can safely assume if one of the RMs is missing from DNS, they can't be the > active one anyways, so clients jobs can still be submitted while people fix > the DNS issues. > A sample exception when one of the entries are missing: > {code} > 17/11/02 18:20:35 INFO service.AbstractService: Service > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state > STARTED; cause: java.lang.IllegalArgumentException: > java.net.UnknownHostException: some.dns.entry > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:374) > > at > org.apache.hadoop.yarn.client.ClientRMProxy.getTokenService(ClientRMProxy.java:153) > > at > org.apache.hadoop.yarn.client.ClientRMProxy.getAMRMTokenService(ClientRMProxy.java:138) > > at > org.apache.hadoop.yarn.client.ClientRMProxy.setAMRMTokenService(ClientRMProxy.java:80) > > at > org.apache.hadoop.yarn.client.ClientRMProxy.getRMAddress(ClientRMProxy.java:99) > > at > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxyInternal(ConfiguredRMFailoverProxyProvider.java:76) > > at > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.getProxy(ConfiguredRMFailoverProxyProvider.java:90) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:75) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:66) > > at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58) > at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:95) > at > org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72) > > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186) > > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:65) > at > org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:359) > > at > org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:435) > > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:774) > > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) > > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) > > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:772) > > at > org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:795) > > at > org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) > Caused by: java.net.UnknownHostException: some.dns.entry > ... 28 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org