[ https://issues.apache.org/jira/browse/YARN-9399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Íñigo Goiri reassigned YARN-9399: --------------------------------- Assignee: Íñigo Goiri > Yarn Client may use stale DNS to connect to RM > ---------------------------------------------- > > Key: YARN-9399 > URL: https://issues.apache.org/jira/browse/YARN-9399 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.9.1 > Reporter: Leon zhang > Assignee: Íñigo Goiri > Priority: Major > Labels: patch > Original Estimate: 168h > Remaining Estimate: 168h > > This happens more frequently when running yarn in Kubernetes. When yarn > client try to connect to RM, if the DNS of RM is not resovable due to > kube-dns failure or not ready, the yarn client will initaize itself with > unresoved InetSocketAddress in RMProxy#newProxyInstance(). The connect to RM > will fail with UnknownHostException. Yarn client will retry the connection by > RetryProxy by it always use the cached unresolved InetSocketAddress. The > retry will never success. When RM is reschdured to another kubernetes node, > which changed the RM ip, this bug will also happen. Currently the work around > is to restarting the Yarn client. > This issue happens in both HA and non-HA of RM. HDFS has simialr issues. > [https://github.com/apache-spark-on-k8s/kubernetes-HDFS/issues/48] > I propose to add a new RMFailoverProxyProvider called > AutoRefreshRMFailoverProxyProvider which will resove the DNS in the > overwriten function getProxy(). This way, RetryProxy can resolve the DNS each > time it retry. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org