[ https://issues.apache.org/jira/browse/YARN-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931192#comment-16931192 ]
Sunil Govindan commented on YARN-9838: -------------------------------------- [~jiulongZhu] Thanks for reporting this issues. Few general nits: 1. Please keep the Jira open, and click on the "Patch Available" button once u ready with a patch. 2. rename patch to YARN-9838.0001.patch or so to make the naming convention unique, and jenkins will auto run the test cases. coming to the patch, there are some improvements made in YARN-5932. Could you please whether that will solve the issues which you mentioned. > Using the CapacityScheduler,Apply "movetoqueue" on the application which CS > reserved containers for,will cause "Num Container" and "Used Resource" in > ResourceUsage metrics error > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-9838 > URL: https://issues.apache.org/jira/browse/YARN-9838 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler > Affects Versions: 2.7.3 > Reporter: jiulongzhu > Priority: Critical > Labels: patch > Fix For: 2.7.3 > > Attachments: RM_UI_metric_negative.png, RM_UI_metric_positive.png, > bug_fix_capacityScheduler_moveApplication.patch > > > In some clusters of ours, we are seeing "Used Resource","Used > Capacity","Absolute Used Capacity" and "Num Container" is positive or > negative when the queue is absolutely idle(no RUNNING, no NEW apps...).In > extreme cases, apps couldn't be submitted to the queue that is actually idle > but the "Used Resource" is far more than zero, just like "Container Leak". > Firstly,I found that "Used Resource","Used Capacity" and "Absolute Used > Capacity" use the "Used" value of ResourceUsage kept by AbstractCSQueue, and > "Num Container" use the "numContainer" value kept by LeafQueue.And > AbstractCSQueue#allocateResource and AbstractCSQueue#releaseResource will > change the state value of "numContainer" and "Used". Secondly, by comparing > the values numContainer and ResourceUsageByLabel and QueueMetrics > changed(#allocateContainer and #releaseContainer) logic of applications with > and without "movetoqueue",i found that moving the reservedContainers didn't > modify the "numContainer" value in AbstractCSQueue and "used" value in > ResourceUsage when the application was moved from a queue to another queue. > The metric values changed logic of reservedContainers are allocated, > and moved from $FROM queue to $TO queue, and released.The degree of increase > and decrease is not conservative, the Resource allocated from $FROM queue and > release to $TO queue. > ||move reversedContainer||allocate||movetoqueue||release|| > |numContainer|increase in $FROM queue|{color:#FF0000}$FROM queue stay the > same,$TO queue stay the same{color}|decrease in $TO queue| > |ResourceUsageByLabel(USED)|increase in $FROM queue|{color:#FF0000}$FROM > queue stay the same,$TO queue stay the same{color}|decrease in $TO queue | > |QueueMetrics|increase in $FROM queue|decrease in $FROM queue, increase in > $TO queue|decrease in $TO queue| > The metric values changed logic of allocatedContainer(allocated, > acquired, running) are allocated, and movetoqueue, and released are > absolutely conservative. > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org