[jira] [Updated] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling
[ https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-8546: - Attachment: YARN-8546.branch-2.10.001.patch > Resource leak caused by a reserved container being released more than once > under async scheduling > - > > Key: YARN-8546 > URL: https://issues.apache.org/jira/browse/YARN-8546 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Major > Labels: global-scheduling > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8546.001.patch, YARN-8546.branch-2.10.001.patch > > > I was able to reproduce this issue by starting a job, and this job keeps > requesting containers until it uses up cluster available resource. My cluster > has 70200 vcores, and each task it applies for 100 vcores, I was expecting > total 702 containers can be allocated but eventually there was only 701. The > last container could not get allocated because queue used resource is updated > to be more than 100%. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling
[ https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8546: - Fix Version/s: (was: 3.1.2) 3.1.1 > Resource leak caused by a reserved container being released more than once > under async scheduling > - > > Key: YARN-8546 > URL: https://issues.apache.org/jira/browse/YARN-8546 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Major > Labels: global-scheduling > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8546.001.patch > > > I was able to reproduce this issue by starting a job, and this job keeps > requesting containers until it uses up cluster available resource. My cluster > has 70200 vcores, and each task it applies for 100 vcores, I was expecting > total 702 containers can be allocated but eventually there was only 701. The > last container could not get allocated because queue used resource is updated > to be more than 100%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling
[ https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8546: -- Fix Version/s: (was: 3.1.1) 3.1.2 > Resource leak caused by a reserved container being released more than once > under async scheduling > - > > Key: YARN-8546 > URL: https://issues.apache.org/jira/browse/YARN-8546 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Major > Labels: global-scheduling > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-8546.001.patch > > > I was able to reproduce this issue by starting a job, and this job keeps > requesting containers until it uses up cluster available resource. My cluster > has 70200 vcores, and each task it applies for 100 vcores, I was expecting > total 702 containers can be allocated but eventually there was only 701. The > last container could not get allocated because queue used resource is updated > to be more than 100%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling
[ https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-8546: -- Summary: Resource leak caused by a reserved container being released more than once under async scheduling (was: A reserved container might be released multiple times under async scheduling) > Resource leak caused by a reserved container being released more than once > under async scheduling > - > > Key: YARN-8546 > URL: https://issues.apache.org/jira/browse/YARN-8546 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.1.0 >Reporter: Weiwei Yang >Assignee: Tao Yang >Priority: Major > Labels: global-scheduling > Attachments: YARN-8546.001.patch > > > I was able to reproduce this issue by starting a job, and this job keeps > requesting containers until it uses up cluster available resource. My cluster > has 70200 vcores, and each task it applies for 100 vcores, I was expecting > total 702 containers can be allocated but eventually there was only 701. The > last container could not get allocated because queue used resource is updated > to be more than 100%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org