[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042078#comment-16042078 ] maobaolong commented on YARN-4024: -- [~zhiguohong] Thank you for the quick reply, in our situation, all nodes's ip never changed actually, so i will assign this value to 600, i think it is big enough. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042057#comment-16042057 ] Hong Zhiguo commented on YARN-4024: --- [~maobaolong], this depends on the probability that nodes getting new IP address without shutting down or NM restart. If you are sure it's zero, then you can assign it a very big value. Actually this is the situation of our clusters. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042049#comment-16042049 ] maobaolong commented on YARN-4024: -- [~zhiguohong] Sorry for the wrong copy of the question above, what i really want to ask is the node-ip-cache.expiry-interval-secs, and the question become to "please give a suggestion about what value is suitable of node-ip-cache.expiry-interval-secs when a cluster has 4000 node?" > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040075#comment-16040075 ] Hong Zhiguo commented on YARN-4024: --- [~maobaolong], we don't turn on log-aggregation to avoid the pressure to network and HDFS. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038459#comment-16038459 ] maobaolong commented on YARN-4024: -- [~zhiguohong] A question addition, please give a suggestion about what value is suitable of yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds when a cluster has 4000 node? > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16038399#comment-16038399 ] maobaolong commented on YARN-4024: -- [~zhiguohong] Thanks for this great improvement, i have a minor question. {code:java} public void handle(NodesListManagerEvent event) { RMNode eventNode = event.getNode(); switch (event.getType()) { case NODE_UNUSABLE: LOG.debug(eventNode + " reported unusable"); unusableRMNodesConcurrentSet.add(eventNode); for(RMApp app: rmContext.getRMApps().values()) { if (!app.isAppFinalStateStored()) { this.rmContext .getDispatcher() .getEventHandler() .handle( new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode, RMAppNodeUpdateType.NODE_UNUSABLE)); } } break; case NODE_USABLE: if (unusableRMNodesConcurrentSet.contains(eventNode)) { LOG.debug(eventNode + " reported usable"); unusableRMNodesConcurrentSet.remove(eventNode); } for (RMApp app : rmContext.getRMApps().values()) { if (!app.isAppFinalStateStored()) { this.rmContext .getDispatcher() .getEventHandler() .handle( new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode, RMAppNodeUpdateType.NODE_USABLE)); } } break; default: LOG.error("Ignoring invalid eventtype " + event.getType()); } // remove the cache of normalized hostname if enabled if (resolver instanceof CachedResolver) { ((CachedResolver)resolver).removeFromCache( eventNode.getNodeID().getHost()); } } {code} As the handle method ahead, i see the removeFromCache method will be invoke when handle method is called and no matter whether the event type is valid. Do you think a return in the end of the default block is correct? Or you can removeFromCache definitely when the event type is NODE_UNUSABLE or NODE_USABLE. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: YARN-4024-draft.patch, YARN-4024-draft-v2.patch, > YARN-4024-draft-v3.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075797#comment-15075797 ] Varun Saxena commented on YARN-4024: Sorry assigned JIRA to myself by mistake. Assigned it back to [~zhiguohong]. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075619#comment-15075619 ] Ming Ma commented on YARN-4024: --- Thanks for the good improvement [~leftnoteasy], [~zhiguohong], [~sunilg], [~adhoot]! For cache timeout interval, should we change the semantics of -1 as "cache forever" and 0 as "no cache" to be more consistent with JVM setting "networkaddress.cache.ttl"? > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731759#comment-14731759 ] Hudson commented on YARN-4024: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2276 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2276/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731604#comment-14731604 ] Hudson commented on YARN-4024: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #349 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/349/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731660#comment-14731660 ] Hudson commented on YARN-4024: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #355 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/355/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731725#comment-14731725 ] Hudson commented on YARN-4024: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #337 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/337/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java * hadoop-yarn-project/CHANGES.txt > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731754#comment-14731754 ] Hudson commented on YARN-4024: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2298 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2298/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731579#comment-14731579 ] Hudson commented on YARN-4024: -- FAILURE: Integrated in Hadoop-trunk-Commit #8407 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8407/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731680#comment-14731680 ] Hudson commented on YARN-4024: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1087 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1087/]) YARN-4024. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat. (Hong Zhiguo via wangda) (wangda: rev bcc85e3bab78bcacd430eac23141774465b96ef9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestNodesListManager.java * hadoop-yarn-project/CHANGES.txt > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Fix For: 2.8.0 > > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728412#comment-14728412 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 50s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 54s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 44s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 0s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 58m 16s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 108m 31s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751913/YARN-4024-v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d31a41c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8990/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-resourcemanager.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8990/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8990/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8990/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8990/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8990/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8990/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8990/console | This message was automatically generated. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727959#comment-14727959 ] Wangda Tan commented on YARN-4024: -- Rekicked Jenkins, will commit once Jenkins get back. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728216#comment-14728216 ] Wangda Tan commented on YARN-4024: -- Strange test failures, rekicked Jenkins again.. > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725830#comment-14725830 ] Anubhav Dhoot commented on YARN-4024: - LGTM. Thanks for removing those sleep > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724371#comment-14724371 ] Wangda Tan commented on YARN-4024: -- Latest patch LGTM, [~adhoot], any comments? > YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat > -- > > Key: YARN-4024 > URL: https://issues.apache.org/jira/browse/YARN-4024 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Hong Zhiguo > Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, > YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, > YARN-4024-v6.patch, YARN-4024-v7.patch > > > Currently, YARN RM NodesListManager will resolve IP address every time when > node doing heartbeat. When DNS server becomes slow, NM heartbeat will be > blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721163#comment-14721163 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 0s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 40s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 53m 33s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 102m 19s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751913/YARN-4024-v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e2c9b28 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8943/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8943/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8943/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8943/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8943/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8943/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8943/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch, YARN-4024-v7.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721144#comment-14721144 ] Hong Zhiguo commented on YARN-4024: --- Why jenkins doesn't run against the latest patch? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch, YARN-4024-v7.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707290#comment-14707290 ] Anubhav Dhoot commented on YARN-4024: - Hi [~zhiguohong] the fix looks good. We can avoid adding a sleep to the test if we use Clock instead of System in CachedResolver#addToCache and ExpireChecker#run. That way you can use a ControlledClock in the test to manipulate time and verify expiry. Another minor nit, the assertEquals has an as first argument in many places in TestNodesListManager. You can remove this argument completely since the other overload does the same thing or replace with a proper string if you wish. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707475#comment-14707475 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 59s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 39s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 58s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 53m 32s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 102m 55s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.server.resourcemanager.security.TestAMRMTokens | | | hadoop.yarn.server.resourcemanager.TestRMAdminService | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751636/YARN-4024-v6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 22de7c1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8893/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8893/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8893/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8893/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8893/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8893/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8893/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707086#comment-14707086 ] Wangda Tan commented on YARN-4024: -- By some reason Jenkins wasn't triggerred, manual triggered Jenkins. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705403#comment-14705403 ] Wangda Tan commented on YARN-4024: -- [~zhiguohong], thanks for update, patch generally looks good, could you take a look at findbugs warning? bq. FindBugsmodule:hadoop-yarn-server-resourcemanager Sometimes it shows 0 findbugs when you click at the findbugs report link, but it has some bugs. You can go to yarn-resourcemanager project to run mvn clean findbugs:findbugs. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702970#comment-14702970 ] Sunil G commented on YARN-4024: --- Hi [~zhiguohong] Thanks for working on this. Some high level comments. - I feel {{ExpireChecker}} could extend Timer class and it can be started from serviceStart of NodesListManager, and can be cancelled from serviceStop. Pls share your opinion. - Since we use *removeCache*, I think *update* will be better suited as *addToCache* - Please add more comments and details about *interface Resolver* and its api. I feel the api *resolve* has to be UnStable and Public for now. may be we can separate the interface to another file. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703806#comment-14703806 ] Wangda Tan commented on YARN-4024: -- Thanks update, [~zhiguohong] and comments from [~sunilg]. bq. Since we use removeCache, I think update will be better suited as addToCache +1, I think it's better to rename it to removeFromCache. bq. Please add more comments and details about interface Resolver and its api. I feel the api resolve has to be UnStable and Public for now. may be we can separate the interface to another file. Since it's an internal only interface, I think you should remove public for them. And no need to add @Unstable and @public for such internal interfaces. bq. LOG.debug([ +... Should be wrapped by isDebugEnabled Regarding to tests, I think it maybe easier to expose a getResolver (just default accessibility is fine) and mark it to be @VisibleForTesting like other tests. And it may be important to add a test to make sure DirectResolver will be created to avoid future possible regression. And I think it's better to modify check {{if (nodeIpCacheTimeout == -1) {}} to be = 0, since it doesn't make sense to have a timeout = 0 for CachedResolver. Thoughts? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702814#comment-14702814 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 35s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 1s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 29s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 44s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 2m 0s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 53m 59s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 101m 2s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751205/YARN-4024-v4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 22dc5fc | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8883/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8883/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8883/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8883/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8883/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8883/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8883/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701028#comment-14701028 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 23s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 38s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 9s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 0m 21s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 53m 34s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 97m 42s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750953/YARN-4024-draft-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8872/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8872/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8872/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701598#comment-14701598 ] Wangda Tan commented on YARN-4024: -- Hi [~zhiguohong], Thanks for update, some minor comments: 1) I think we can limit the changes of remove cache in the NodesListManager, in the handle(..), we can do the flush(..), it will be as same as doing this in RMNodeImpl, and don't need expose an extra method, correct? 2) I suggest to rename CachedResolver.flush to something like removeCache, flush is more like a file system concept to me. 3) Add tests to see if NodesListManager can handle events correctly if you agree with 2). YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701285#comment-14701285 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 1s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 54s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 37s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 1m 55s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 57m 27s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 105m 51s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.util.TestRackResolver | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751009/YARN-4024-draft-v3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 71566e2 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8875/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8875/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8875/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699424#comment-14699424 ] Hadoop QA commented on YARN-4024: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 27s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 29s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 22s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 59s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 52m 58s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 12s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-resourcemanager | | Failed unit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | | | hadoop.yarn.server.resourcemanager.rmapp.TestNodesListManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750780/YARN-4024-draft.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 13604bd | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8861/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8861/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8861/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8861/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8861/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8861/console | This message was automatically generated. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699285#comment-14699285 ] Hong Zhiguo commented on YARN-4024: --- In this patch, both positive and negative lookup result is cached and has the same expiry interval. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699908#comment-14699908 ] Wangda Tan commented on YARN-4024: -- Hi [~zhiguohong], Thanks for working on this, For your comments: bq. I think that's too complicated... Agree, I changed my idea, see https://issues.apache.org/jira/browse/YARN-4024?focusedCommentId=14660607page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14660607. Approach in your patch general looks good to me, few suggestions: 1) When node becomes NODE_UNUSABLE/NODE_USABLE, I suggest remove them from the cache to force update its ip, since a node status change will (likely) update its ip. So this may require update the Resolver interface 2) seconds - something like normalizedHostnameCacheTimeout [~wilsoncraft], bq. When a nodemanager is decommissioned, is the IP cached for that host flushed out of the cache? Normally when a host gets a new IP its because it gets moved or some other deliberate maintenance which would normally be preceded by a decommission. If the IP is flushed when decommissioned or a IP is always resolved from the host name when a new or recommissioned nodemanager is added to the cluster I think that would be adequate IMHO. I'm not quite sure about what did you mean, does my comment solve the problem you meantioned? bq. 1) When node becomes NODE_UNUSABLE/NODE_USABLE, I suggest remove them from the cache to force update its ip, since a node status change will (likely) update its ip. So this may require update the Resolver interface bq. Also, it may be worthwhile or adequate to expose the method in a yarn rmadin command to force a flush of the IP cache. Is this IP cache the same used for Rack Awareness by the RM? I prefer keep this to be an internal behavior, this won't be used to determine rack IIUC. Please let me know your thoughts. Thanks, Wangda YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699645#comment-14699645 ] Allan Wilson commented on YARN-4024: When a nodemanager is decommissioned, is the IP cached for that host flushed out of the cache? Normally when a host gets a new IP its because it gets moved or some other deliberate maintenance which would normally be preceded by a decommission. If the IP is flushed when decommissioned or a IP is always resolved from the host name when a new or recommissioned nodemanager is added to the cluster I think that would be adequate IMHO. Also, it may be worthwhile or adequate to expose the method in a yarn rmadin command to force a flush of the IP cache. Is this IP cache the same used for Rack Awareness by the RM? Thanks YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698927#comment-14698927 ] Hong Zhiguo commented on YARN-4024: --- That's a good reason to have this cache. [~leftnoteasy], in earlier comments, you said {code} 1) If a host_a, has IP=IP1, IP1 is on whitelist. If we change the IP of host_a to IP2, IP2 is in blacklist. We won't do the re-resolve since the cached IP1 is on whitelist. 2) If a host_a, has IP=IP1, IP1 is on blacklist. We may need to do re-resolve every time when the node doing heartbeat since it may change to its IP to a one not on the blacklist. {code} I think that's too complicated. The cache lookup is a part of resolving (name to address). And the check of IP whitelist/blacklist is just the following stage. I think cache with configurable expiration is enough, we'd better leave the 2 stages orthogonal, not to mix them up. BTW, I think it's not good to have Name in NodeId, but Address in whitelist/blacklist. Different layers of abstraction are mixed up. We'll don't have this issue if Name or Address is used for both NodeId and whitelist/blacklist. a better way is to have Name in whitelist/blacklist, instead of Address. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698929#comment-14698929 ] Hong Zhiguo commented on YARN-4024: --- Please ignore the last sentence a better way is to have Name in whitelist/blacklist, instead of Address. Or could someone help to delete it. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697341#comment-14697341 ] Wangda Tan commented on YARN-4024: -- [~zhiguohong], the DNS cache is a global parameter for a JVM, correct? IMHO, we shouldn't use the global parameter, because RM may need to get latest IP address from DNS for other purpose. For example, RM needs to get latest address when NMs are registering (and also reconnect), but it may not need it when NMs is running. Thoughts? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695011#comment-14695011 ] Hong Zhiguo commented on YARN-4024: --- There's DNS cache in InetAddress. What's the benefit to have another layer of cache in memory? Maybe easier to control? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Hong Zhiguo Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679444#comment-14679444 ] Hong Zhiguo commented on YARN-4024: --- We've did this one year ago in our 5k+ cluster. Can I take this issue? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14679471#comment-14679471 ] Wangda Tan commented on YARN-4024: -- [~zhiguohong], sure, please go ahead! YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660595#comment-14660595 ] Wangda Tan commented on YARN-4024: -- [~zhiguohong], I noticed YARN-4001 after filed YARN-4024, since we have some discussions on YARN-4024 already. I suggest to close YARN-4001 and feel free to assign this one to you. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660570#comment-14660570 ] zhihai xu commented on YARN-4024: - Does this issue duplicate with YARN-4001? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660607#comment-14660607 ] Wangda Tan commented on YARN-4024: -- Thanks for sharing your thoughts, [~sunilg]. I think we can clear cache every X seconds, it will be an option like max-ip-caching time, and -1 will disable cache. Sounds like a plan? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660619#comment-14660619 ] Sunil G commented on YARN-4024: --- +1. Yes. That will be fine. I feel we can keep unit as mins may be. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660587#comment-14660587 ] Wangda Tan commented on YARN-4024: -- Thanks for pointing YARN-4001, yes, they're same issue. Since this one has more discussion, I suggest to keep this one open and close YARN-4001. I'm making me unassigned. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660602#comment-14660602 ] Sunil G commented on YARN-4024: --- Thanks [~leftnoteasy] for sharing detailed case. Yes, what I meant is a case where lookup fails and we need to resolve again. And further to this then we ll end up in having black/white level issue. I feel there will not be much of a problem to do only caching provided we need to clear cache from some external way once a threshold is reached. This is to save too much of invalid IPs as in your case. Will it be fine? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660559#comment-14660559 ] Sunil G commented on YARN-4024: --- Hi [~leftnoteasy] Thank you for bringing this, which is a potential area to cause delay n heartbeat. I feel we can cache IP addresses in RM side during NM registration, and can look into this storage as needed. Also apart from switch on/off this, cud we try to resolve IP if this cache lookup fails and can add to this list. So a basic lookup in cache during heartbeat, and upon failure we can resolve using existing way. A possible distant problem is that some one uses proxy server ahead of NM and which changes IP every time. I feel it is too negative case. How do you feel? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660583#comment-14660583 ] Wangda Tan commented on YARN-4024: -- [~sunilg], Thanks for comment, I'm not sure what's the cache lookup fails. There're two different kinds of cache lookup fails. One is the IP doesn't in the cache, we definitely need to re-resolve the address. Another one is the resolved IP is not a valid host according to hostsReader, there're two different cases: 1) If a host_a, has IP=IP1, IP1 is on whitelist. If we change the IP of host_a to IP2, IP2 is in blacklist. We won't do the re-resolve since the cached IP1 is on whitelist. 2) If a host_a, has IP=IP1, IP1 is on blacklist. We may need to do re-resolve every time when the node doing heartbeat since it may change to its IP to a one not on the blacklist. So my thinking on this is: there should be a switch to control this, when a node's IP won't change OR there's no black/white node list, we should do caching, otherwise we need do resolving for every node heartbeat. Thoughts? YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)