Wilfred Spiegelenburg created YARN-7585: -------------------------------------------
Summary: NodeManager should go unhealthy when state store throws DBException Key: YARN-7585 URL: https://issues.apache.org/jira/browse/YARN-7585 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg If work preserving recover is enabled the NM will not start up if the state store does not initialise. However if the state store becomes unavailable after that for any reason the NM will not go unhealthy. Since the state store is not available new containers can not be started any more and the NM should become unhealthy: {code} AMLauncher: Error launching appattempt_1508806289867_268617_000001. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: Read-only file system at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721) ... Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: Read-only file system at o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374) at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848) at o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org