[ https://issues.apache.org/jira/browse/YARN-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285660#comment-15285660 ]
Sangjin Lee commented on YARN-5095: ----------------------------------- I'm not sure if this is related but I also see this log in the RM log: {noformat} 2016-05-16 14:19:29,930 ERROR org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector: Error aggregating timeline metrics java.lang.NullPointerException at org.apache.hadoop.yarn.server.timelineservice.storage.common.Separator.joinEncoded(Separator.java:249) at org.apache.hadoop.yarn.server.timelineservice.storage.application.ApplicationRowKey.getRowKey(ApplicationRowKey.java:110) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.write(HBaseTimelineWriterImpl.java:131) at org.apache.hadoop.yarn.server.timelineservice.collector.AppLevelTimelineCollector$AppLevelAggregator.run(AppLevelTimelineCollector.java:136) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {noformat} It's quite possible this is a separate issue. > flow activities and flow runs are populated with wrong timestamp when RM > restarts w/ recovery enabled > ----------------------------------------------------------------------------------------------------- > > Key: YARN-5095 > URL: https://issues.apache.org/jira/browse/YARN-5095 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Sangjin Lee > Priority: Critical > Labels: yarn-2928-1st-milestone > > I have the RM recovery enabled. I see that upon restart the RM populates > records into flow activity and flow runs but with *wrong* timestamps. What I > mean by the timestamp is the part of the row key: > - flow activity: row created with the day of the RM restart > - flow run: row created with the RM start time as the "run id" > The following illustrates an example flow run: > {noformat} > metrics: [ ], > events: [ ], > id: "sjlee@Sleep job/1463433569917", > type: "YARN_FLOW_RUN", > createdtime: 1463422860987, > info: { > UID: "yarn_cluster!sjlee!Sleep job!1463433569917", > SYSTEM_INFO_FLOW_RUN_ID: 1463433569917, > SYSTEM_INFO_FLOW_NAME: "Sleep job", > SYSTEM_INFO_FLOW_RUN_END_TIME: 1463422865033, > SYSTEM_INFO_USER: "sjlee" > }, > isrelatedto: { }, > relatesto: { } > {noformat} > The created time and the end time are correct (i.e. original time), whereas > the timestamp in the row key (= run id: 1463433569917) is actually later than > the end time and coincides with the RM restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org