[ 
https://issues.apache.org/jira/browse/YARN-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1421:
------------------------------------

    Description: 
Problem :- Today for every application we track the node managers where 
containers ran. So when application finishes it notifies all those node 
managers about application finish event (via node manager heartbeat). However 
if rm restarts then we forget this past information and those node managers 
will never get application finish event and will keep reporting finished 
applications.

Proposed Solution :- Instead of remembering the node managers where containers 
ran for this particular application it would be better if we depend on node 
manager heartbeat to take this decision. i.e. when node manager heartbeats 
saying it is running application (app1, app2) then we should check those 
application's status in RM's memory {code}rmContext.getRMApps(){code} and if 
either they are not found (very old applications) or they are in their final 
state (FINISHED, KILLED, FAILED) then we should immediately notify the node 
manager about the application finish event. By doing this we are reducing the 
state which we need to store at RM after restart.

  was:
Problem :- Today for every application we track the node managers where 
container ran. So when application finishes it notifies all those node managers 
about application finish event (via node manager heartbeat). However if rm 
restarts then we forget this past information and those node managers will 
never get application finish event and will keep reporting finished 
applications.

Propose Solution :- Instead of remembering the node managers where containers 
ran for this particular application it would be better if we depend on node 
manager heartbeat to take this decision. i.e. when node manager heartbeats 
saying it is running application (app1, app2) then we should those 
application's status in RM's memory {code}rmContext.getRMApps(){code} and if 
either they are not found (very old applications) or they are in their final 
state (FINISHED, KILLED, FAILED) then we should immediately notify the node 
manager about the application finish event.


> Node managers will not receive application finish event where containers ran 
> before RM restart
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-1421
>                 URL: https://issues.apache.org/jira/browse/YARN-1421
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Omkar Vinit Joshi
>            Assignee: Omkar Vinit Joshi
>            Priority: Critical
>
> Problem :- Today for every application we track the node managers where 
> containers ran. So when application finishes it notifies all those node 
> managers about application finish event (via node manager heartbeat). However 
> if rm restarts then we forget this past information and those node managers 
> will never get application finish event and will keep reporting finished 
> applications.
> Proposed Solution :- Instead of remembering the node managers where 
> containers ran for this particular application it would be better if we 
> depend on node manager heartbeat to take this decision. i.e. when node 
> manager heartbeats saying it is running application (app1, app2) then we 
> should check those application's status in RM's memory 
> {code}rmContext.getRMApps(){code} and if either they are not found (very old 
> applications) or they are in their final state (FINISHED, KILLED, FAILED) 
> then we should immediately notify the node manager about the application 
> finish event. By doing this we are reducing the state which we need to store 
> at RM after restart.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to