[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743495#comment-14743495 ]
Junping Du commented on YARN-3212: ---------------------------------- Thanks [~leftnoteasy] for review and comments! bq. 1. Why shutdown a "decommissioning" NM if it is doing heartbeat. Should we allow it continue heartbeat, since RM needs to know about container finished / killed information. We don't shutdown a "decommissioning" NM. On the contrary, we differentiates nodes in decommissioning from others which get false in nodesListManager.isValidNode() check so it can still get running instead of decommissioned. bq. 2. Do we have timeout of graceful decomission? Which will update a node to "DECOMMISSIONED" after the timeout. There are some discussions in umbrella JIRA (https://issues.apache.org/jira/browse/YARN-914?focusedCommentId=14314653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14314653), so we decide to track timeout in CLI instead of RM. The CLI patch (YARN-3225) also shows that. bq. 3. If I understand correct, decommissioning is another running state, except: We cannot allocate any new containers to it. Exactly. Another different is available resource should get updated with each running container get finished. bq. If answer to question #2 is no, I suggest to rename RMNodeEventType.DECOMISSION_WITH_TIMEOUT to GRACEFUL_DECOMISSION, since it doesn't have a "real" timeout. Already replied above that we support timeout in CLI. DECOMISSION_WITH_TIMEOUT sounds more clear comparing with old DECOMMISSION event. Thoughts? bq. Why this is need? .addTransition(NodeState.DECOMMISSIONING, NodeState.DECOMMISSIONING, RMNodeEventType.DECOMMISSION_WITH_TIMEOUT, new DecommissioningNodeTransition(NodeState.DECOMMISSIONING)) If not adding this transition, an InvalidStateTransitionException will get thrown in our state machine which sounds not right for a normal operation. bq. Should we simply ignore the DECOMMISSION_WITH_TIMEOUT event? No. RM should aware this event so later do some precisely update on available resource, etc. (YARN-3223). bq. Is there specific considerations that transfer UNHEALTHY to DECOMISSIONED when DECOMMISSION_WITH_TIMEOUT received? Is it better to transfer it to DECOMISSIONING since it has some containers running on it? I don't have a strong preference in this case. However, my previous consideration is UNHEALTHY event comes from machine monitor which indicate the node is not quite suitable for containers keep running while DECOMMISSION_WITH_TIMEOUT comes from user who is prefer to decommission a batch of nodes without affecting app/container running if there are currently running *normally*. So I think make it get decommissioned sounds a simpler way before we have more operation experience with this new feature. I have similiar view on discussion above on UNHEALTHY event to a decommissioning event (https://issues.apache.org/jira/browse/YARN-3212?focusedCommentId=14693360&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14693360). May be we can retrospect on this later? bq. One suggestion of how to handle node update to scheduler: I think you can add a field "isDecomissioning" to NodeUpdateSchedulerEvent, and scheduler can do all updates except allocate container. Thanks for good suggestion here. YARN-3223 will handle the balance of NM's total resource and used resource (so available resource is always 0). So this could be an option that we can use this way (new scheduler event) to keep NM resource balanced. There are also other options too so we can move the discussion to that JIRA I think. > RMNode State Transition Update with DECOMMISSIONING state > --------------------------------------------------------- > > Key: YARN-3212 > URL: https://issues.apache.org/jira/browse/YARN-3212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Junping Du > Assignee: Junping Du > Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, > YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, > YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch > > > As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and > can transition from “running” state triggered by a new event - > “decommissioning”. > This new state can be transit to state of “decommissioned” when > Resource_Update if no running apps on this NM or NM reconnect after restart. > Or it received DECOMMISSIONED event (after timeout from CLI). > In addition, it can back to “running” if user decides to cancel previous > decommission by calling recommission on the same node. The reaction to other > events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)