[ 
https://issues.apache.org/jira/browse/YARN-8404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509149#comment-16509149
 ] 

Sunil Govindan commented on YARN-8404:
--------------------------------------

I offline synced up with [~rohithsharma]. Moving async definitely will  avoid 
any potential Async Dispatcher block. This is more important as of now and we 
can go ahead with this patch for now. Will open another Jira to see how to 
tackle appFinished event missing scenario.

I will commit this by end of day if there are no objections.

> RM Event dispatcher is blocked if ATS1/1.5 server is not running. 
> ------------------------------------------------------------------
>
>                 Key: YARN-8404
>                 URL: https://issues.apache.org/jira/browse/YARN-8404
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.2
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Blocker
>         Attachments: YARN-8404.01.patch
>
>
> It is observed that if ATS1/1.5 daemon is not running, RM recovery is delayed 
> as long as timeline client get timed out for each applications. By default, 
> timed out will take around 5 mins. If completed applications are more then 
> amount of time RM will wait is *(number of completed applications in a 
> cluster * 5 minutes)* which is kind of hanged. 
> Primary reason for this behavior is YARN-3044 YARN-4129 which refactor 
> existing system metric publisher. This refactoring made appFinished event as 
> synchronous which was asynchronous earlier. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to