[ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986213#comment-13986213 ]
Wangda Tan commented on YARN-1368: ---------------------------------- Thanks [~jianhe] for this proposal, I think recover container from NM heartbeat is a reasonable way, +1 for general ideas, Some minor comments, bq. Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any reason for keeping both ? thinking to merge the common methods into SchedulerNode. Currently IMO, we'd better keep both. To avoid involving too much parts in this JIRA, we can separate the merge common logic of them to a new task. bq. ContainerStatus sent in NM registration doesn’t capture enough information for re-constructing the containers. we may replace that with a new object or just adding more fields to encapsulate all the necessary information for re-constructing the container. Personally I think create a new type specialized for container recovering is better, ContainerStatus is also used in node heartbeat. Including too much fields in each heartbeat isn't safe or efficient > Common work to re-populate containers’ state into scheduler > ----------------------------------------------------------- > > Key: YARN-1368 > URL: https://issues.apache.org/jira/browse/YARN-1368 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Anubhav Dhoot > Attachments: YARN-1368.preliminary.patch > > > YARN-1367 adds support for the NM to tell the RM about all currently running > containers upon registration. The RM needs to send this information to the > schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover > the current allocation state of the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)