[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6555:
-----------------------------
    Fix Version/s: YARN-5355-branch-2
                   YARN-5355

> Store application flow context in NM state store for work-preserving restart
> ----------------------------------------------------------------------------
>
>                 Key: YARN-6555
>                 URL: https://issues.apache.org/jira/browse/YARN-6555
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>              Labels: yarn-5355-merge-blocker
>             Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>
>         Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to