[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411012#comment-15411012 ]
Junping Du commented on YARN-4676: ---------------------------------- Thanks for updating the patch, [~danzhi]! I went through the whole patch this weekend except the documentation part. I would suggest to separate document code into a separated JIRA if we don't want this JIRA lingering on for longer time as reviewing the doc is really a bit difficult in general. My overall feeling for the patch is we are getting closer, but still many issues to address, see below: In HostsFileReader.java, {noformat} + public synchronized Map<String, Integer> getExcludedHostsWithTimeout() { + return excludes; + } {noformat} Two issues here: 1. synchronized doesn't work here if other places update excludes (protected in write lock), we should use read lock to protected. 2. we shouldn't return excludes directly which could cause leak of writer lock because the caller can potentially update the map here. Please refer example like getHostDetails() which is more thread safe in this case. {noformat} + // Examples: + // <host><name>host1</name></host> + // <host><name>host2</name><timeout>123</timeout></host> + // <host><name>host3</name><timeout>-1</timeout></host> {noformat} Can we allow multiple hosts in the same item if their timeout value is the same, like: "<host><name>host1, host2, host3</name><timeout>123</timeout></host>"? It sounds much concisely and can save lots of duplicated meta info. Should be easy to address. Isn't it? In RMAdminCLI.java, {noformat} + public static void readXmlFileToMapWithFileInputStream(String type, + String filename, InputStream fileInputStream, Map<String, Integer> map) + throws IOException {noformat} I don't see how we throw IOException here given IOException is already catched and throw runtime exception instead. We should remove if it is not necessary. {noformat} + if ("-g".equals(args[1]) || "-graceful".equals(args[1])) { {noformat} If we want to support "-graceful" in addition to -g, then we should mention it in commandline usage messages also. In DecommissioningNodesWatcher.java, {noformat} + * DecommissioningNodesWatcher basically is no cost when no node is + * DECOMMISSIONING. This sacrifices the possibility that that node once + * host containers of an still running application. {noformat} I don't quite understand the last sentense here. Can you have more clarification here? In {{void update(RMNode rmNode, NodeStatus remoteNodeStatus)}}, {noformat} + if (rmNode.getState() == NodeState.DECOMMISSIONED) { + if (context == null) { + return; + } + context.nodeState = rmNode.getState(); + // keep DECOMMISSIONED node for a while for status log. + if (context.decommissionedTime == 0) { + context.decommissionedTime = now; + } else if (now - context.decommissionedTime > 60000L) { + decomNodes.remove(rmNode.getNodeID()); + LOG.info("remove " + rmNode.getState() + " " + rmNode.getNodeID()); + } {noformat} Why we are waiting for 60 seconds to remove node from decomNodes after node in DECOMMISSIONED? Shouldn't we remove it directly? {noformat} + long waitTime = + System.currentTimeMillis() - context.decommissioningStartTime; {noformat} We should use MonotonicClock to track all timeout calculations which won't affected by system clock changes, please refer AbstractLivelinessMonitor as an example. PollTimerTask.run() The logic here need more improvements: 1. it doesn't have any sleep time for monitoring interval - I don't think we should so tightly loop here; 2. we don't check the NM that updated in latest 30 seconds or 60 seconds which sounds too gross-grained timeout tracking. 3. Many unnecessary log operations. I think we should follow similar logic in AbstractLivelinessMonitor although we cannot use it directly as each node could have different expire interval. Also, we should move the whole logic to loop staleNodes to check timeout/ready and send DECOMMISSION and a statemachine transition that can be triggered directly when received a READY or TIMEOUT event. I will do more tests in next 1 day or 2 to make sure it won't affect forcefully decommission functionality and client side timeout tracking. > Automatic and Asynchronous Decommissioning Nodes Status Tracking > ---------------------------------------------------------------- > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Affects Versions: 2.8.0 > Reporter: Daniel Zhi > Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, > GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, > YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, > YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, > YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, > YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, > YARN-4676.018.patch, YARN-4676.019.patch, YARN-4676.020.patch > > > YARN-4676 implements an automatic, asynchronous and flexible mechanism to > graceful decommission > YARN nodes. After user issues the refreshNodes request, ResourceManager > automatically evaluates > status of all affected nodes to kicks out decommission or recommission > actions. RM asynchronously > tracks container and application status related to DECOMMISSIONING nodes to > decommission the > nodes immediately after there are ready to be decommissioned. Decommissioning > timeout at individual > nodes granularity is supported and could be dynamically updated. The > mechanism naturally supports multiple > independent graceful decommissioning “sessions” where each one involves > different sets of nodes with > different timeout settings. Such support is ideal and necessary for graceful > decommission request issued > by external cluster management software instead of human. > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org