[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571732#comment-14571732
 ] 

Zhijie Shen commented on YARN-3044:
-----------------------------------

[~Naganarasimha], thanks for updating the patch. It looks good to me so far, 
but I want to hold the patch for the following issues.

1. After YARN-3276 is committed, this patch will conflict on {{return 
l2.compareTo(l1);}}.

2. We're reworking YARN-1462. It won't affect this patch, but there's commit 
revert. Let's wait until YARN-1462 is done.

3. It not caused by this patch, but I found a race condition of publishing app 
finish event:
{code}
15/06/03 14:59:56 INFO rmapp.RMAppImpl: application_1433367826630_0002 State 
change from FINISHING to FINISHED
15/06/03 14:59:56 INFO capacity.LeafQueue: completedContainer 
container=Container: [ContainerId: container_1433367826630_0002_01_000001, 
NodeId: localhost:9105, NodeHttpAddress: localhost:8042, Resource: 
<memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, 
service: 127.0.0.1:9105 }, ] queue=default: capacity=1.0, absoluteCapacity=1.0, 
usedResources=<memory:0, vCores:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, 
numApps=1, numContainers=0 cluster=<memory:8192, vCores:8>
15/06/03 14:59:56 INFO resourcemanager.RMAuditLogger: USER=zshen        
OPERATION=Application Finished - Succeeded      TARGET=RMAppManager     
RESULT=SUCCESS  APPID=application_1433367826630_0002
15/06/03 14:59:56 INFO capacity.ParentQueue: completedContainer queue=root 
usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> 
cluster=<memory:8192, vCores:8>
15/06/03 14:59:56 ERROR metrics.TimelineServiceV2Publisher: Error when 
publishing entity TimelineEntity[type='YARN_APPLICATION', 
id='application_1433367826630_0002']
java.lang.NullPointerException
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:273)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.publishApplicationFinishedEvent(TimelineServiceV2Publisher.java:133)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:70)
        at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractTimelineServicePublisher.handle(AbstractTimelineServicePublisher.java:35)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
        at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
        at java.lang.Thread.run(Thread.java:745)
15/06/03 14:59:56 INFO amlauncher.AMLauncher: Cleaning master 
appattempt_1433367826630_0002_000001
{code}

I think the problem is we stop the timeline collector immediately after calling 
appFinished, which is an async call, and publishing operation is executed 
asynchronously on another thread. One option is to stopTimelineCollector after 
publishing finish event in publisher. Can you take care of it?
{code}
      app.rmContext.getSystemMetricsPublisher()
          .appFinished(app, finalState, app.finishTime);

      app.stopTimelineCollector();
{code}

> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>
>                 Key: YARN-3044
>                 URL: https://issues.apache.org/jira/browse/YARN-3044
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
> YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to