[ 
https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-7585:
----------------------------------------
    Attachment: YARN-7585.001.patch

Patch to maek service unhealthy when the DB throws errors after it has restored 
and the NM is up and running. Failures during restore will mark the NM as 
failed during startup.

> NodeManager should go unhealthy when state store throws DBException 
> --------------------------------------------------------------------
>
>                 Key: YARN-7585
>                 URL: https://issues.apache.org/jira/browse/YARN-7585
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>         Attachments: YARN-7585.001.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state 
> store does not initialise. However if the state store becomes unavailable 
> after that for any reason the NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any 
> more and the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got 
> exception: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at 
> o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to