[ https://issues.apache.org/jira/browse/YARN-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812432#comment-15812432 ]
Ravi Prakash commented on YARN-6054: ------------------------------------ Thanks Naganarasimha for your careful review! As I posted in the first comment, the repair did indeed fix the issue for us (we had a production incident.) As I'm sure you'll understand, we can't post the leveldb files in the open source. # I feel this JIRA is very specific to the TimelineServer so I am hesitant to include other daemons. Also, as pointed out by Jason, (e.g. in the case of NM) graceful degradation would be a very hard thing to achieve. More likely, the state is corrupt and will cause undefined behavior. # Fair point. Will do. # Great idea. Will do. > TimelineServer fails to start when some LevelDb state files are missing. > ------------------------------------------------------------------------ > > Key: YARN-6054 > URL: https://issues.apache.org/jira/browse/YARN-6054 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.0.0-alpha2 > Reporter: Ravi Prakash > Assignee: Ravi Prakash > Attachments: YARN-6054.01.patch, YARN-6054.02.patch > > > We encountered an issue recently where the TimelineServer failed to start > because some state files went missing. > {code} > 2016-11-21 20:46:43,134 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer > failed in state INITED > ; cause: org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: <levelDbStorePath>/timelines > erver/leveldb-timeline-store.ldb/127897.sst > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: <levelDbStorePath>/timelineserver/lev > eldb-timeline-store.ldb/127897.sst > 2016-11-21 20:46:43,135 FATAL > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer: > Error starting ApplicationHistoryServer > org.apache.hadoop.service.ServiceStateException: > org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 9 > missing files; e.g.: > <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceInit(ApplicationHistoryServer.java:104) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:172) > at > org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:182) > Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: > Corruption: 9 missing files; e.g.: > <levelDbStorePath>/timelineserver/leveldb-timeline-store.ldb/127897.sst > at > org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) > at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) > at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) > at > org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore.serviceInit(LeveldbTimelineStore.java:229) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 2016-11-21 20:46:43,136 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status -1 > {code} > Ideally we shouldn't have any missing state files. However I'd posit that the > TimelineServer should have graceful degradation instead of failing to start > at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org