[ 
https://issues.apache.org/jira/browse/YARN-8990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680481#comment-16680481
 ] 

Haibo Chen commented on YARN-8990:
----------------------------------

Thanks [~wilfreds] for the patch!  I have taken the liberty to update the patch 
to fix another race condition (though much more rare): QueueManager can see a 
leaf queue being empty while FSLeafQueue.addApp() is called in the middle of    
  
{code:java}
return queue.getNumRunnableApps() == 0 &&
          leafQueue.getNumNonRunnableApps() == 0 &&
          leafQueue.getNumAssignedApps() == 0;{code}
 

> FS: race condition in app submit and queue cleanup
> --------------------------------------------------
>
>                 Key: YARN-8990
>                 URL: https://issues.apache.org/jira/browse/YARN-8990
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 3.2.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Blocker
>         Attachments: YARN-8990.001.patch
>
>
> With the introduction of the dynamic queue deletion in YARN-8191 a race 
> condition was introduced that can cause a queue to be removed while an 
> application submit is in progress.
> The issue occurs in {{FairScheduler.addApplication()}} when an application is 
> submitted to a dynamic queue which is empty or the queue does not exist yet. 
> If during the processing of the application submit the 
> {{AllocationFileLoaderService}} kicks of for an update the queue clean up 
> will be run first. The application submit first creates the queue and get a 
> reference back to the queue. 
> Other checks are performed and as the last action before getting ready to 
> generate an AppAttempt the queue is updated to show the submitted application 
> ID..
> The time between the queue creation and the queue update to show the submit 
> is long enough for the queue to be removed. The application however is lost 
> and will never get any resources assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to