[ 
https://issues.apache.org/jira/browse/YARN-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022203#comment-16022203
 ] 

Rohith Sharma K S commented on YARN-6555:
-----------------------------------------

bq. Do you think we should preserve as much flow context information as 
possible? The patch only stores flow context in the state store only if all 
three fields of flow context is present. We could sanitize the flow context and 
fill in default values for whatever field is missing and then just check if 
flowcontext !=null before storing application state
There are 2 cents. 
# IMO, we should NOT set default values for flow context. There are 2 cases, 
## Master container launched : RM sets flow context in container launch context 
and start it. This required to be recovered during NM restart. 
## AM launches containers : Flow context details are not set. So, it is not 
required to store and recover during NM restart and no use also. 
# additional null check for strings before creating a proto is because setter 
method for strings in proto throws NPE if  flowName or flowVersion are null. 

bq. FlowContext.toString(). Can we do something like {k1=v1, k2=v2, k3=v3} for 
better readability in the log?
make sense, I will change it next patch after Vrushal review it. 


> Enable flow context read (& corresponding write) for recovering application 
> with NM restart 
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-6555
>                 URL: https://issues.apache.org/jira/browse/YARN-6555
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-5355, YARN-5355-branch-2, 3.0.0-alpha3
>            Reporter: Vrushali C
>            Assignee: Rohith Sharma K S
>         Attachments: YARN-6555.001.patch, YARN-6555.002.patch
>
>
> If timeline service v2 is enabled and NM is restarted with recovery enabled, 
> then NM fails to start and throws an error as  "flow context can't be null".
> This is happening because the flow context did not exist before but now that 
> timeline service v2 is enabled, ApplicationImpl expects it to exist. 
> This would also happen even if flow context existed before but since we are 
> not persisting it / reading it during 
> ContainerManagerImpl#recoverApplication, it does not get passed in to 
> ApplicationImpl.
> full stack trace
> {code}
> 2017-05-03 21:51:52,178 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> java.lang.IllegalArgumentException: flow context cannot be null
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:104)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.<init>(ApplicationImpl.java:90)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverApplication(ContainerManagerImpl.java:318)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:280)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:267)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:276)
>         at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:588)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:649)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to