[ https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053715#comment-16053715 ]
daemon commented on YARN-6710: ------------------------------ [~yufeigu] 使用中文表述可能更清楚些, 这个问题导致的原因对于YARN端主要是由于 1. Application attempt运行完成之后,AM 向RM发送unregisterApplicationMaster RPC请求。RM在处理 这个消息时,做些简单的处理然后就向FairScheduler发送APP_ATTEMPT_REMOVED消息就返回了。 而APP_ATTEMPT_REMOVED的处理是异步的,所以在FairScheduler中,对应的FSAppAttempt会过段时间 才会被remove掉。 这个问题会导致两个比较严重的后果发生: 1. 在这个时间间隔,FairScheduler还会给FSAppAttempt 分派Container。 并且会在分派Container的时候,如果 if (getLiveContainers().size() == 1 && !getUnmanagedAM()) 情况满足的话,会继续累加am resource的值到amResourceUsage,使得amResourceUsage的值比实际的值大很多。 在实际的情况中,可能会导致队列中的 作业一直pending,并且永远得不到资源, 这个就是我在上面描述的情况。 对于amResourceUsage统计的值比实际大很多问题,社区已经有patch fix这个问题了。 具体可以查看这个jira: https://issues.apache.org/jira/browse/YARN-3415。 2. 导致FairScheduler会给已经Finished的Application attempt分派Container, 虽然对应的Container,在NM汇报 心跳的时候,RM会给NM发送Response,让对应的NM cleanup它。 但是会造成资源的浪费。 并且目前调度速度那么快, 这种问题会更加明显。 虽然社区版本中已经解决了amResourceUsage的问题,但我觉得它只是解决了问题域中的一部分。 上述的问题2也是急需要解决的问题。 虽然我看到YARN-3415对应的也解决了Spark框架中unreigster application attempt之前把对应的pending的申请资源申请都清空了。 但是YARN作为一个通用的资源分派框架是需要Cover这些所有可能遇到的情况。对于一个通用的资源分派框架,我们不能限定用户的使用方式。 不能依赖用户每次unregister application master的时候,会在之前释放所有pending的request。 所以,我们需要在分派container之前就要做对应的判断,这个是急需解决的问题。麻烦yufei根据我所说的,再 评估下这个问题有没有需要解决。 谢谢, > There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair > scheduler not assign container to the queue > ----------------------------------------------------------------------------------------------------------------------- > > Key: YARN-6710 > URL: https://issues.apache.org/jira/browse/YARN-6710 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler > Affects Versions: 2.7.2 > Reporter: daemon > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > There are over three thousand nodes in my hadoop production cluster, and we > use fair schedule as my scheduler. > Though there are many free resource in my resource manager, but there are 46 > applications pending. > Those applications can not run after several hours, and in the end I have to > stop them. > I reproduce the scene in my test environment, and I find a bug in > FSLeafQueue. > In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater > than itself. > When fair scheduler try to assign container to a application attempt, it > will do as follow check: > !screenshot-2.png! > !screenshot-3.png! > Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater > then it real value. > So when the value of amResourceUsage greater than the value of > Resources.multiply(getFairShare(), maxAMShare) , > and the FSLeafQueue#canRunAppAM function will return false which will let the > fair scheduler not assign container > to the FSAppAttempt. > In this scenario, all the application attempt will pending and never get any > resource. > I find the reason why so many applications in my leaf queue is pending. I > will describe it as follow: > When fair scheduler first assign a container to the application attempt, it > will do something as blow: > !screenshot-4.png! > When fair scheduler remove the application attempt from the leaf queue, it > will do something as blow: > !screenshot-5.png! > But when application attempt unregister itself, and all the container in the > SchedulerApplicationAttempt#liveContainers > are complete. There is a APP_ATTEMPT_REMOVED event will send to fair > scheduler, but it is asynchronous. > Before the application attempt is removed from FSLeafQueue, and there are > pending request in FSAppAttempt. > The fair scheduler will assign container to the FSAppAttempt, because the > size of the liveContainers will equals to > 1. > So the FSLeafQueue will add to container resource to the > FSLeafQueue#amResourceUsage, it will > let the value of amResourceUsage greater then itself. > In the end, the value of FSLeafQueue#amResourceUsage is preety large although > there is no application > it the queue. > When new application come, and the value of FSLeafQueue#amResourceUsage > greater than the value > of Resources.multiply(getFairShare(), maxAMShare), it will let the scheduler > never assign container to > the queue. > All of the applications in the queue will always pending. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org