[ 
https://issues.apache.org/jira/browse/YARN-8404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507645#comment-16507645
 ] 

Sunil Govindan commented on YARN-8404:
--------------------------------------

Thanks [~rohithsharma] for the patch.

I think one of the reason to make appFinished sync is NOT to loose the event to 
publish to ATS. Statestore and ATS will get updated same time. Though this 
approach seems fine, i think there is more risk of exposing a sync event to be 
called from main Dispatcher. If ATS is down, this will block the dispatcher 
thread. And a n/w delay or something similar will cause even OOM from RM.

Hence I think its a tradeoff of loosing an event at times, however for time 
being its better to keep it async till a better cache or similar approach can 
be brought in to save the finish event publish.

Current approach in the patch seems fine to me. I will wait for others to 
review the same.

> RM Event dispatcher is blocked if ATS1/1.5 server is not running. 
> ------------------------------------------------------------------
>
>                 Key: YARN-8404
>                 URL: https://issues.apache.org/jira/browse/YARN-8404
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 3.0.2
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Blocker
>         Attachments: YARN-8404.01.patch
>
>
> It is observed that if ATS1/1.5 daemon is not running, RM recovery is delayed 
> as long as timeline client get timed out for each applications. By default, 
> timed out will take around 5 mins. If completed applications are more then 
> amount of time RM will wait is *(number of completed applications in a 
> cluster * 5 minutes)* which is kind of hanged. 
> Primary reason for this behavior is YARN-3044 YARN-4129 which refactor 
> existing system metric publisher. This refactoring made appFinished event as 
> synchronous which was asynchronous earlier. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to