[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

Junping Du (JIRA) Sun, 07 Aug 2016 10:47:35 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411012#comment-15411012
 ]


Junping Du commented on YARN-4676:
----------------------------------

Thanks for updating the patch, [~danzhi]! I went through the whole patch this 
weekend except the documentation part. I would suggest to separate document 
code into a separated JIRA if we don't want this JIRA lingering on for longer 
time as reviewing the doc is really a bit difficult in general. 
My overall feeling for the patch is we are getting closer, but still many 
issues to address, see below:
In HostsFileReader.java,
{noformat}
+  public synchronized Map<String, Integer> getExcludedHostsWithTimeout() {
+    return excludes;
+  }
{noformat}
Two issues here: 1. synchronized doesn't work here if other places update 
excludes (protected in write lock), we should use read lock to protected. 2. we 
shouldn't return excludes directly which could cause leak of writer lock 
because the caller can potentially update the map here. Please refer example 
like getHostDetails() which is more thread safe in this case.

{noformat}
+      // Examples:
+      // <host><name>host1</name></host>
+      // <host><name>host2</name><timeout>123</timeout></host>
+      // <host><name>host3</name><timeout>-1</timeout></host>
{noformat}
Can we allow multiple hosts in the same item if their timeout value is the 
same, like: "<host><name>host1, host2, 
host3</name><timeout>123</timeout></host>"? It sounds much concisely and can 
save lots of duplicated meta info. Should be easy to address. Isn't it?

In RMAdminCLI.java,
{noformat}
+  public static void readXmlFileToMapWithFileInputStream(String type,
+      String filename, InputStream fileInputStream, Map<String, Integer> map)
+          throws IOException
{noformat}
I don't see how we throw IOException here given IOException is already catched 
and throw runtime exception instead. We should remove if it is not necessary.

{noformat}
+      if ("-g".equals(args[1]) || "-graceful".equals(args[1])) {
{noformat}
If we want to support "-graceful" in addition to -g, then we should mention it 
in commandline usage messages also.

In DecommissioningNodesWatcher.java,
{noformat}
+ * DecommissioningNodesWatcher basically is no cost when no node is
+ * DECOMMISSIONING. This sacrifices the possibility that that node once
+ * host containers of an still running application.
{noformat}
 I don't quite understand the last sentense here. Can you have more 
clarification here?

In {{void update(RMNode rmNode, NodeStatus remoteNodeStatus)}},
{noformat}
+    if (rmNode.getState() == NodeState.DECOMMISSIONED) {
+      if (context == null) {
+        return;
+      }
+      context.nodeState = rmNode.getState();
+      // keep DECOMMISSIONED node for a while for status log.
+      if (context.decommissionedTime == 0) {
+        context.decommissionedTime = now;
+      } else if (now - context.decommissionedTime > 60000L) {
+        decomNodes.remove(rmNode.getNodeID());
+        LOG.info("remove " + rmNode.getState() + " " + rmNode.getNodeID());
+      }
{noformat}
Why we are waiting for 60 seconds to remove node from decomNodes after node in 
DECOMMISSIONED? Shouldn't we remove it directly?

{noformat}
+    long waitTime =
+        System.currentTimeMillis() - context.decommissioningStartTime;
{noformat}
We should use MonotonicClock to track all timeout calculations which won't 
affected by system clock changes, please refer AbstractLivelinessMonitor as an 
example.

PollTimerTask.run()
The logic here need more improvements: 1. it doesn't have any sleep time for 
monitoring interval - I don't think we should so tightly loop here; 2. we don't 
check the NM that updated in latest 30 seconds or 60 seconds which sounds too 
gross-grained timeout tracking. 3. Many unnecessary log operations.
I think we should follow similar logic in AbstractLivelinessMonitor although we 
cannot use it directly as each node could have different expire interval. Also, 
we should move the whole logic to loop staleNodes to check timeout/ready and 
send DECOMMISSION and a statemachine transition that can be triggered directly 
when received a READY or TIMEOUT event.

I will do more tests in next 1 day or 2 to make sure it won't affect forcefully 
decommission functionality and client side timeout tracking.

> Automatic and Asynchronous Decommissioning Nodes Status Tracking
> ----------------------------------------------------------------
>
>                 Key: YARN-4676
>                 URL: https://issues.apache.org/jira/browse/YARN-4676
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.8.0
>            Reporter: Daniel Zhi
>            Assignee: Daniel Zhi
>              Labels: features
>         Attachments: GracefulDecommissionYarnNode.pdf, 
> GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, YARN-4676.005.patch, 
> YARN-4676.006.patch, YARN-4676.007.patch, YARN-4676.008.patch, 
> YARN-4676.009.patch, YARN-4676.010.patch, YARN-4676.011.patch, 
> YARN-4676.012.patch, YARN-4676.013.patch, YARN-4676.014.patch, 
> YARN-4676.015.patch, YARN-4676.016.patch, YARN-4676.017.patch, 
> YARN-4676.018.patch, YARN-4676.019.patch, YARN-4676.020.patch
>
>
> YARN-4676 implements an automatic, asynchronous and flexible mechanism to 
> graceful decommission
> YARN nodes. After user issues the refreshNodes request, ResourceManager 
> automatically evaluates
> status of all affected nodes to kicks out decommission or recommission 
> actions. RM asynchronously
> tracks container and application status related to DECOMMISSIONING nodes to 
> decommission the
> nodes immediately after there are ready to be decommissioned. Decommissioning 
> timeout at individual
> nodes granularity is supported and could be dynamically updated. The 
> mechanism naturally supports multiple
> independent graceful decommissioning “sessions” where each one involves 
> different sets of nodes with
> different timeout settings. Such support is ideal and necessary for graceful 
> decommission request issued
> by external cluster management software instead of human.
> DecommissioningNodeWatcher inside ResourceTrackingService tracks 
> DECOMMISSIONING nodes status automatically and asynchronously after 
> client/admin made the graceful decommission request. It tracks 
> DECOMMISSIONING nodes status to decide when, after all running containers on 
> the node have completed, will be transitioned into DECOMMISSIONED state. 
> NodesListManager detect and handle include and exclude list changes to kick 
> out decommission or recommission as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking

Reply via email to