[ 
https://issues.apache.org/jira/browse/YARN-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053715#comment-16053715
 ] 

daemon commented on YARN-6710:
------------------------------

[~yufeigu]  使用中文表述可能更清楚些, 这个问题导致的原因对于YARN端主要是由于
1. Application attempt运行完成之后,AM 向RM发送unregisterApplicationMaster RPC请求。RM在处理
这个消息时,做些简单的处理然后就向FairScheduler发送APP_ATTEMPT_REMOVED消息就返回了。 
而APP_ATTEMPT_REMOVED的处理是异步的,所以在FairScheduler中,对应的FSAppAttempt会过段时间
才会被remove掉。

这个问题会导致两个比较严重的后果发生:
1. 在这个时间间隔,FairScheduler还会给FSAppAttempt 分派Container。 并且会在分派Container的时候,如果
if (getLiveContainers().size() == 1 && !getUnmanagedAM()) 情况满足的话,会继续累加am 
resource的值到amResourceUsage,使得amResourceUsage的值比实际的值大很多。 在实际的情况中,可能会导致队列中的
作业一直pending,并且永远得不到资源, 这个就是我在上面描述的情况。  
对于amResourceUsage统计的值比实际大很多问题,社区已经有patch fix这个问题了。 具体可以查看这个jira:
https://issues.apache.org/jira/browse/YARN-3415。

2. 导致FairScheduler会给已经Finished的Application attempt分派Container, 
虽然对应的Container,在NM汇报
心跳的时候,RM会给NM发送Response,让对应的NM cleanup它。 但是会造成资源的浪费。 并且目前调度速度那么快,
这种问题会更加明显。

虽然社区版本中已经解决了amResourceUsage的问题,但我觉得它只是解决了问题域中的一部分。 
上述的问题2也是急需要解决的问题。 虽然我看到YARN-3415对应的也解决了Spark框架中unreigster application 
attempt之前把对应的pending的申请资源申请都清空了。 
但是YARN作为一个通用的资源分派框架是需要Cover这些所有可能遇到的情况。对于一个通用的资源分派框架,我们不能限定用户的使用方式。
不能依赖用户每次unregister application master的时候,会在之前释放所有pending的request。

所以,我们需要在分派container之前就要做对应的判断,这个是急需解决的问题。麻烦yufei根据我所说的,再
评估下这个问题有没有需要解决。

谢谢,


> There is a heavy bug in FSLeafQueue#amResourceUsage which will let the fair 
> scheduler not assign container to the queue
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-6710
>                 URL: https://issues.apache.org/jira/browse/YARN-6710
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.7.2
>            Reporter: daemon
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There are over three thousand nodes in my hadoop production cluster, and we 
> use fair schedule as my scheduler.
> Though there are many free resource in my resource manager, but there are 46 
> applications pending. 
> Those applications can not run after  several hours, and in the end I have to 
> stop them.
> I reproduce the scene in my test environment, and I find a bug in 
> FSLeafQueue. 
> In a extreme scenario it will let the FSLeafQueue#amResourceUsage greater 
> than itself.
> When fair scheduler try to assign container to a application attempt,  it 
> will do as follow check:
> !screenshot-2.png!
> !screenshot-3.png!
> Because the value of FSLeafQueue#amResourceUsage is invalid, it will greater 
> then it real value.
> So when the value of amResourceUsage greater than the value of 
> Resources.multiply(getFairShare(), maxAMShare) ,
> and the FSLeafQueue#canRunAppAM function will return false which will let the 
> fair scheduler not assign container
> to the FSAppAttempt. 
> In this scenario, all the application attempt will pending and never get any 
> resource.
> I find the reason why so many applications in my leaf queue is pending. I 
> will describe it as follow:
> When fair scheduler first assign a container to the application attempt, it 
> will do something as blow:
> !screenshot-4.png!
> When fair scheduler remove the application attempt from the leaf queue, it 
> will do something as blow:
> !screenshot-5.png!
> But when application attempt unregister itself, and all the container in the 
> SchedulerApplicationAttempt#liveContainers 
> are complete.  There is a APP_ATTEMPT_REMOVED event will send to fair 
> scheduler, but it is asynchronous.
> Before the application attempt is removed from FSLeafQueue, and there are 
> pending request in FSAppAttempt.
> The fair scheduler will assign container to the FSAppAttempt, because the 
> size of the liveContainers will equals to
> 1. 
> So the FSLeafQueue will add to container resource to the 
> FSLeafQueue#amResourceUsage,  it will
> let the value of amResourceUsage greater then itself. 
> In the end, the value of FSLeafQueue#amResourceUsage is preety large although 
> there is no application
> it the queue.
> When new application come, and the value of FSLeafQueue#amResourceUsage  
> greater than the value
> of Resources.multiply(getFairShare(), maxAMShare), it will let the scheduler 
> never assign container to
> the queue.
> All of the applications in the queue will always pending.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to