Ravi Prakash created YARN-6054: ---------------------------------- Summary: TimelineServer fails to start when some LevelDb state files are missing. Key: YARN-6054 URL: https://issues.apache.org/jira/browse/YARN-6054 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0-alpha2 Reporter: Ravi Prakash
We encountered an issue recently where the TimelineServer failed to start because some state files went missing. {code} 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED ; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelines erver/leveldb-timeline-store.ldb/127897.sst org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/lev eldb-timeline-store.ldb/127897.sst 2016-11-21 20:46:43,135 FATAL org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: Error starting ApplicationHistoryServer org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 missing files; e.g.: <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) ... 5 more 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with status -1 {code} Ideally we shouldn't have any missing state files. However I'd posit that the TimelineServer should have graceful degradation instead of failing to start at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org