[ https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048658#comment-16048658 ]
daemon commented on YARN-6710: ------------------------------ [~dan...@cloudera.com] I am sorry, I am try to express myself. But my english is so poor, so it is very slow for me to express myself. > There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair > scheduler not assign container to the queue > ----------------------------------------------------------------------------------------------------------------------- > > Key: YARN-6710 > URL: https://issues.apache.org/jira/browse/YARN-6710 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.2 > Reporter: daemon > Fix For: 2.8.0 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > There are over three thousand nodes in my hadoop production cluster, and we > use fair schedule as my scheduler. > Though there are many free resource in my resource manager, but there are 46 > applications pending. > Those applications can not run after several hours, and in the end I have to > stop them. > I reproduce the scene in my test environment, and I find a bug in > FSLeafQueue. > In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater > than itself. > When fair scheduler try to assign container to a application attempt, it > will do as follow check: > !screenshot-2.png! > !screenshot-3.png! > Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater > then it real value. > So when the value of amResourceUsage greater than the value of > Resources.multiply(getFairShare(), maxAMShare) , > and the FSLeafQueue#canRunAppAM function will return false which will let the > fair scheduler not assign container > to the FSAppAttempt. > In this scenario, all the application attempt will pending and never get any > resource. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org