[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-2171: Priority: Major (was: Critical) AMs block on the CapacityScheduler lock during allocate() - Key: YARN-2171 URL: https://issues.apache.org/jira/browse/YARN-2171 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2171.patch, YARN-2171v2.patch When AMs heartbeat into the RM via the allocate() call they are blocking on the CapacityScheduler lock when trying to get the number of nodes in the cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-2171: Priority: Critical (was: Major) AMs block on the CapacityScheduler lock during allocate() - Key: YARN-2171 URL: https://issues.apache.org/jira/browse/YARN-2171 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-2171.patch, YARN-2171v2.patch When AMs heartbeat into the RM via the allocate() call they are blocking on the CapacityScheduler lock when trying to get the number of nodes in the cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-2171: Target Version/s: 2.5.0 (was: 0.23.11, 2.5.0) AMs block on the CapacityScheduler lock during allocate() - Key: YARN-2171 URL: https://issues.apache.org/jira/browse/YARN-2171 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-2171.patch, YARN-2171v2.patch When AMs heartbeat into the RM via the allocate() call they are blocking on the CapacityScheduler lock when trying to get the number of nodes in the cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2171: - Attachment: YARN-2171.patch Patch to use AtomicInteger for the number of nodes so we can avoid grabbing the lock to access the value. I also added a unit test to verify allocate doesn't try to grab the capacity scheduler lock. AMs block on the CapacityScheduler lock during allocate() - Key: YARN-2171 URL: https://issues.apache.org/jira/browse/YARN-2171 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-2171.patch When AMs heartbeat into the RM via the allocate() call they are blocking on the CapacityScheduler lock when trying to get the number of nodes in the cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2171) AMs block on the CapacityScheduler lock during allocate()
[ https://issues.apache.org/jira/browse/YARN-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2171: - Attachment: YARN-2171v2.patch The point of the unit test was to catch regressions at a high level. If anyone changes the code such that calling allocate() will grab the scheduler lock then the test will fail, whether that's a regression in this particular method or some new method that's added that ApplicationMasterService or CapacityScheduler itself calls and grabs the lock. I added a separate unit test to exercise the getNumClusterNodes method. The AHS unit test failure seems unrelated, and it passes for me locally even with this change. AMs block on the CapacityScheduler lock during allocate() - Key: YARN-2171 URL: https://issues.apache.org/jira/browse/YARN-2171 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: YARN-2171.patch, YARN-2171v2.patch When AMs heartbeat into the RM via the allocate() call they are blocking on the CapacityScheduler lock when trying to get the number of nodes in the cluster via getNumClusterNodes. -- This message was sent by Atlassian JIRA (v6.2#6252)