[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305953#comment-17305953
 ] 

Andras Gyori commented on YARN-9618:
------------------------------------

Thank you [~zhuqi] for the patch. I have analysed the code a bit and I think 
the main performance gain here is due to eliminating the unnecessary back 
reference to rmDispatcher on RMAppNodeUpdateEvent. Is using an other async 
dispatcher justified here? My standing on this issue is:
 * The rmDispatcher will still have its eventQueue filled with 
NodeListManagerEvents.
 * The new async dispatcher is an other layer of abstraction, and its sole 
purpose is copying the events from the rmDispatcher to its own event queue then 
handling them just as rmDispatcher would do
 * The NodeListManager#handle will block on getting RMApp instances, because 
they are stored in a ConcurrentMap

I think the new async dispatcher only makes sense, if the 
NodeListManager#sendRMAppNodeUpdateEventToNonFinalizedApps blocks the 
rmDispatcher thread for more time, than it takes to copy an event from 
rmDispatcher#eventQueue to nodeListManagerDispatcher#eventQueue. Checking the 
performance gain with and without the async dispatcher would be a really 
helpful metric here.

> NodeListManager event improvement
> ---------------------------------
>
>                 Key: YARN-9618
>                 URL: https://issues.apache.org/jira/browse/YARN-9618
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bibin Chundatt
>            Assignee: Qi Zhu
>            Priority: Critical
>         Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to