[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986213#comment-13986213
 ] 

Wangda Tan commented on YARN-1368:
----------------------------------

Thanks [~jianhe] for this proposal, I think recover container from NM heartbeat 
is a reasonable way, +1 for general ideas,
Some minor comments,
bq. Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any 
reason for keeping both ? thinking to merge the common methods into 
SchedulerNode.
Currently IMO, we'd better keep both. To avoid involving too much parts in this 
JIRA, we can separate the merge common logic of them to a new task.
bq. ContainerStatus sent in NM registration doesn’t capture enough information 
for re-constructing the containers. we may replace that with a new object or 
just adding more fields to encapsulate all the necessary information for 
re-constructing the container.
Personally I think create a new type specialized for container recovering is 
better, ContainerStatus is also used in node heartbeat. Including too much 
fields in each heartbeat isn't safe or efficient

> Common work to re-populate containers’ state into scheduler
> -----------------------------------------------------------
>
>                 Key: YARN-1368
>                 URL: https://issues.apache.org/jira/browse/YARN-1368
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running 
> containers upon registration. The RM needs to send this information to the 
> schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
> the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to