[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765949#comment-16765949
 ] 

Wilfred Spiegelenburg commented on YARN-8655:
---------------------------------------------

Hi [~uranus], I am not saying that what we do now is 100% correct. I am only 
doubting how often this occurs and what the impact on the application and 
scheduling activities is. Based on the analysis I did I think we need a 
solution for this case that has far less impact. Do we know any of the 
following:
How badly does it affect the running applications, do we pre-empt double what 
we should? 
Does not handling this correctly slow down pre-emption? 
Is there another impact of not handling the edge case?

Pre-emption currently runs almost continually and is gated by the {{take()}}: 
when there is a pre-emption waiting we handle it. The patch changes this into 
one pre-emption per second. It effectively throttles down the pre-emption from 
processing applications based on their arrival to slow scheduled trickle.
When I look at how we calculate and decide if the application is marked as 
minimum share starved the cases should be limited. Even if the application is 
fair share starved and the queue is min share starved we do not automatically 
mark the application as min share starved. We thus only have this edge case for 
a small number of applications.
Fixing that edge case by slowing down all pre-emption handling is what I think 
is not right.


> FairScheduler: FSStarvedApps is not thread safe
> -----------------------------------------------
>
>                 Key: YARN-8655
>                 URL: https://issues.apache.org/jira/browse/YARN-8655
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.0.0
>            Reporter: Zhaohui Xin
>            Assignee: Zhaohui Xin
>            Priority: Major
>         Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to