[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323124#comment-14323124
 ] 

Junping Du commented on YARN-914:
---------------------------------

Thanks [~jlowe] for review and comments!
bq. Nit: How about DECOMMISSIONING instead of DECOMMISSION_IN_PROGRESS?
Sounds good. Will update it later.

bq. We should remove its available (not total) resources from the cluster then 
continue to remove available resources as containers complete on that node. 
That's a very good point. Yes. we should update resource in this way.

bq. As for the UI changes, initial thought is that decommissioning nodes should 
still show up in the active nodes list since they are still running containers. 
A separate decommissioning tab to filter for those nodes would be nice, 
although I suppose users can also just use the jquery table to sort/search for 
nodes in that state from the active nodes list if it's too crowded to add yet 
another node state tab (or maybe get rid of some effectively dead tabs like the 
reboot state tab).
Make sense. Will add to proposal and can discuss more details on UI JIRA later.

bq. For the NM restart open question, this should no longer an issue now that 
the NM is unaware of graceful decommission.
Right.

bq. For the AM dealing with being notified of decommissioning, again I think 
this should just be treated like a strict preemption for the short term. IMHO 
all the AM needs to know is that the RM is planning on taking away those 
containers, and what the AM should do about it is similar whether the reason 
for removal is preemption or decommissioning.


bq. Back to the long running services delaying decommissioning concern, does 
YARN even know the difference between a long-running container and a "normal" 
container? 
I am afraid not now. YARN-1039 should be a start to do the differentiation.

bq. If it doesn't, how is it supposed to know a container is not going to 
complete anytime soon? Even a "normal" container could run for many hours. It 
seems to me the first thing we would need before worrying about this scenario 
is the ability for YARN to know/predict the expected runtime of containers.
I think prediction of expected runtime of containers could be hard in YARN 
case. However, can we typically say long running service containers are 
expected to run very long or infinite? If so, notifying AM to preempt 
containers of LRS make more sense here than waiting here for timeout. Isn't it? 

bq. There's still an open question about tracking the timeout RM side instead 
of NM side. Sounds like the NM side is not going to be pursued at this point, 
and we're going with no built-in timeout support in YARN for the short-term.
That was unclear at the beginning of discussion but much clear now, will remove 
this part.

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to