[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2018-04-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450794#comment-16450794
 ] 

Hudson commented on YARN-6555:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14057/])
YARN-6555. Store application flow context in NM state store for (xyao: rev 
67d9c749211acdef0c2ad2dfcacfd172e86fd8f7)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java


> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: yarn-5355-merge-blocker
> Fix For: 2.9.0, YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2017-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146742#comment-16146742
 ] 

Hudson commented on YARN-6555:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12271 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12271/])
YARN-6555. Store application flow context in NM state store for (varunsaxena: 
rev a8f082a180cfc14e7fd347b2c03ab0f31e1dd33c)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java


> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2017-07-12 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084801#comment-16084801
 ] 

Vrushali C commented on YARN-6555:
--

One question I have now is, should we bump up the NM State store version 
(NMLeveldbStateStoreService#CURRENT_VERSION_INFO) ?

The old NM won't be able to read this new state store, no? 

> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha4
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2017-05-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025849#comment-16025849
 ] 

Rohith Sharma K S commented on YARN-6555:
-

cool.. thank you :-)

> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2017-05-25 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025843#comment-16025843
 ] 

Haibo Chen commented on YARN-6555:
--

Yes, I already committed this into trunk

> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2017-05-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025840#comment-16025840
 ] 

Rohith Sharma K S commented on YARN-6555:
-

[~haibo.chen] could we merge this into trunk?

> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6555) Store application flow context in NM state store for work-preserving restart

2017-05-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025813#comment-16025813
 ] 

Hudson commented on YARN-6555:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11786 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11786/])
YARN-6555. Store application flow context in NM state store for (haibochen: rev 
47474fffac085e0e5ea46336bf80ccd0677017a3)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java


> Store application flow context in NM state store for work-preserving restart
> 
>
> Key: YARN-6555
> URL: https://issues.apache.org/jira/browse/YARN-6555
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>  Labels: yarn-5355-merge-blocker
> Fix For: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>
> Attachments: YARN-6555.001.patch, YARN-6555.002.patch, 
> YARN-6555.003.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:104)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.(ApplicationImpl.java:90)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org