[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709605#comment-16709605
 ] 

Wilfred Spiegelenburg commented on YARN-8789:
---------------------------------------------

First: I would expect that the change would be fully tested and thus the 
behaviour with a limited queue would be known and described. Task failures are 
probably more acceptable. Are we really still seeing them with the change from 
MAPREDUCE-5124 applied? If not then making this change is not really warranted.
Before we go further and make a change like this I would also test the 
behaviour. What happens when the queue is full. Looking at the patch there is 
far more change than needed: the current queue can be limited and just that 
change would be far less impactful. The logic for taking an event is also 
changed which I don't think is needed either. Going back to just the basic 
change of limiting the queue after we find that it is needed would be a better 
approach.

Based on that quick analysis I would say this is not an acceptable change in 
its current form.

> Add BoundedQueue to AsyncDispatcher
> -----------------------------------
>
>                 Key: YARN-8789
>                 URL: https://issues.apache.org/jira/browse/YARN-8789
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: applications
>    Affects Versions: 3.2.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>         Attachments: YARN-8789.1.patch, YARN-8789.10.patch, 
> YARN-8789.12.patch, YARN-8789.14.patch, YARN-8789.2.patch, YARN-8789.3.patch, 
> YARN-8789.4.patch, YARN-8789.5.patch, YARN-8789.6.patch, YARN-8789.7.patch, 
> YARN-8789.7.patch, YARN-8789.8.patch, YARN-8789.9.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.
> Logging Message:
> Size of event-queue is xxx



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to