[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571732#comment-14571732 ]
Zhijie Shen commented on YARN-3044: ----------------------------------- [~Naganarasimha], thanks for updating the patch. It looks good to me so far, but I want to hold the patch for the following issues. 1. After YARN-3276 is committed, this patch will conflict on {{return l2.compareTo(l1);}}. 2. We're reworking YARN-1462. It won't affect this patch, but there's commit revert. Let's wait until YARN-1462 is done. 3. It not caused by this patch, but I found a race condition of publishing app finish event: {code} 15/06/03 14:59:56 INFO rmapp.RMAppImpl: application_1433367826630_0002 State change from FINISHING to FINISHED 15/06/03 14:59:56 INFO capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1433367826630_0002_01_000001, NodeId: localhost:9105, NodeHttpAddress: localhost:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 127.0.0.1:9105 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 cluster=<memory:8192, vCores:8> 15/06/03 14:59:56 INFO resourcemanager.RMAuditLogger: USER=zshen OPERATION=Application Finished - Succeeded TARGET=RMAppManager RESULT=SUCCESS APPID=application_1433367826630_0002 15/06/03 14:59:56 INFO capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:8192, vCores:8> 15/06/03 14:59:56 ERROR metrics.TimelineServiceV2Publisher: Error when publishing entity TimelineEntity[type='YARN_APPLICATION', id='application_1433367826630_0002'] java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:273) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.publishApplicationFinishedEvent(TimelineServiceV2Publisher.java:133) at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:70) at org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:35) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745) 15/06/03 14:59:56 INFO amlauncher.AMLauncher: Cleaning master appattempt_1433367826630_0002_000001 {code} I think the problem is we stop the timeline collector immediately after calling appFinished, which is an async call, and publishing operation is executed asynchronously on another thread. One option is to stopTimelineCollector after publishing finish event in publisher. Can you take care of it? {code} app.rmContext.getSystemMetricsPublisher() .appFinished(app, finalState, app.finishTime); app.stopTimelineCollector(); {code} > [Event producers] Implement RM writing app lifecycle events to ATS > ------------------------------------------------------------------ > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Naganarasimha G R > Attachments: YARN-3044-YARN-2928.004.patch, > YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, > YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, > YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, > YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, > YARN-3044.20150416-1.patch > > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)