[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

Sangjin Lee (JIRA) Thu, 07 Jan 2016 17:35:47 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088549#comment-15088549
 ]


Sangjin Lee commented on YARN-3995:
-----------------------------------

bq. Yes i wanted to address it as i was trying to point out earlier Instead of 
spawning multiple threads may be we can have single thread which does this 
activity

Oops, sorry. I didn't see you already mentioned this.

{quote}
IIUC the approach you mentioned in the callable we will be sleeping for the 
configured period for a application and then remove it. but if multiple apps at 
the same time finish then initial apps only wait for configured period but 
subsequent apps wait for lil more time than the earlier ones.(app's wait period 
+ other apps wait period in the queue ) thoughts?
{quote}

ScheduledExecutorService is much more straightforward than that. We can simply 
take advantage of the scheduling feature. The Runnable (or Callable, doesn't 
matter) can simply execute removeApplication():

{code}
ScheduledExecutorService scheduler = 
Executors.newSingleThreadScheduledExecutor();
...
public void stopContainer(ContainerTerminationContext context) {
  ...
  scheduler.schedule(new Runnable() {
    public void run() {
      removeApplicationId(appId);
    }
  }, collectorLingerPeriod, TimeUnit.MILLISECONDS);
}
{code}

It doesn't do this by actually putting the executor service thread to sleep for 
that period, thus there is no worry about delays propagating to the next work 
item. The delay management is all done using the internal queue that 
understands the delays.

> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3995
>                 URL: https://issues.apache.org/jira/browse/YARN-3995
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-3995-feature-YARN-2928.v1.001.patch
>
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

Reply via email to