[ 
https://issues.apache.org/jira/browse/YARN-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14687389#comment-14687389
 ] 

Wangda Tan commented on YARN-4045:
----------------------------------

[~tgraves]/[~shahrs87], 

I think the case could happen when container reservation interacts with node 
disconnect, one example is:
{code}
A cluster has 6 nodes, each node has 20G resource, and usage is
N1-N4, are all used
N5-N6, both of them are used 10G.
An app ask 15G container, assume it is reserved at N5, so total used resource = 
20G * 4 + 10G * 2 + 15G (just reserved) = 115G
Then, N6 disconnected, now cluster resource becomes 100G, and used resource = 
105G.
{code}

I've just checked fixes, YARN-3361 doesn't have related fixes. And currently we 
don't have a fix for above corner case. 

Another problem is caused by DRC, from 2.7.1, we have set availableResource = 
max(availableResource, Resources.none()). 
{code}
    childQueue.getMetrics().setAvailableResourcesToQueue(
        Resources.max(
            calculator, 
            clusterResource, 
            available, 
            Resources.none()
            )
        );
{code}

But if you're using DRC, if a resource has availableMB < 0 and availableVCores 
> 0, it could report such resource > Resources.None(). We may need to fix this 
case as well.

Thoughts?

> Negative avaialbleMB is being reported for root queue.
> ------------------------------------------------------
>
>                 Key: YARN-4045
>                 URL: https://issues.apache.org/jira/browse/YARN-4045
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Rushabh S Shah
>
> We recently deployed 2.7 in one of our cluster.
> We are seeing negative availableMB being reported for queue=root.
> This is from the jmx output:
> {noformat}
> <clusterMetrics>
>     ...
>     <availableMB>-163328</availableMB>
>     ...
> </clusterMetrics>
> {noformat}
> The following is the RM log:
> {noformat}
> 2015-08-10 14:42:28,280 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:28,404 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:30,913 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:33,093 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:35,548 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:35,549 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:39,088 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:39,089 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:39,338 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:39,339 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:39,757 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:39,758 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:43,056 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:43,070 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:44,486 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:44,487 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:44,886 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0032743 
> absoluteUsedCapacity=1.0032743 used=<memory:5334016, vCores:6212> 
> cluster=<memory:5316608, vCores:28320>
> 2015-08-10 14:42:47,401 [ResourceManager Event Processor] INFO 
> capacity.ParentQueue: completedContainer queue=root usedCapacity=1.0029854 
> absoluteUsedCapacity=1.0029854 used=<memory:5332480, vCores:6202> 
> cluster=<memory:5316608, vCores:28320>
> {noformat}
> bq.  used=<memory:5332480, vCores:6202> cluster=<memory:5316608, vCores:28320>
> For root queue, usedCapacity is more than totalCapacity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to