[ https://issues.apache.org/jira/browse/YARN-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881712#comment-15881712 ]
Miklos Szegedi edited comment on YARN-6172 at 2/24/17 1:23 AM: --------------------------------------------------------------- Thank you, [~varun_saxena] for reporting this. I was able to repro the scenario above. There are two issues here. First, the update thread resets the queue demand and adds each application demand to it one by one every time it runs without locking. Whenever this value is sampled, the test compares it with the expected value. However, if we have not finished with the update, this can be 0 or anything less than the actual demand. A different unrelated issue is that the test actually calls {{Thread.yield()}} instead of properly waiting for the expected application count value to propagate. I will send out a patch soon. {code} @Override public void updateDemand() { // Compute demand by iterating through apps in the queue // Limit demand to maxResources demand = Resources.createResource(0); readLock.lock(); try { for (FSAppAttempt sched : runnableApps) { updateDemandForApp(sched); } for (FSAppAttempt sched : nonRunnableApps) { updateDemandForApp(sched); } } finally { readLock.unlock(); } // Cap demand to maxShare to limit allocation to maxShare demand = Resources.componentwiseMin(demand, maxShare); if (LOG.isDebugEnabled()) { LOG.debug("The updated demand for " + getName() + " is " + demand + "; the max is " + maxShare); LOG.debug("The updated fairshare for " + getName() + " is " + getFairShare()); } } private void updateDemandForApp(FSAppAttempt sched) { sched.updateDemand(); Resource toAdd = sched.getDemand(); if (LOG.isDebugEnabled()) { LOG.debug("Counting resource from " + sched.getName() + " " + toAdd + "; Total resource demand for " + getName() + " now " + demand); } demand = Resources.add(demand, toAdd); } {code} was (Author: miklos.szeg...@cloudera.com): I was able to repro the scenario above. There are two issues here. First, the update thread resets the queue demand and adds each application demand to it one by one every time it runs without locking. Whenever this value is sampled, the test compares it with the expected value. However, if we have not finished with the update, this can be 0 or anything less than the actual demand. A different unrelated issue is that the test actually calls {{Thread.yield()}} instead of properly waiting for the expected application count value to propagate. I will send out a patch soon. {code} @Override public void updateDemand() { // Compute demand by iterating through apps in the queue // Limit demand to maxResources demand = Resources.createResource(0); readLock.lock(); try { for (FSAppAttempt sched : runnableApps) { updateDemandForApp(sched); } for (FSAppAttempt sched : nonRunnableApps) { updateDemandForApp(sched); } } finally { readLock.unlock(); } // Cap demand to maxShare to limit allocation to maxShare demand = Resources.componentwiseMin(demand, maxShare); if (LOG.isDebugEnabled()) { LOG.debug("The updated demand for " + getName() + " is " + demand + "; the max is " + maxShare); LOG.debug("The updated fairshare for " + getName() + " is " + getFairShare()); } } private void updateDemandForApp(FSAppAttempt sched) { sched.updateDemand(); Resource toAdd = sched.getDemand(); if (LOG.isDebugEnabled()) { LOG.debug("Counting resource from " + sched.getName() + " " + toAdd + "; Total resource demand for " + getName() + " now " + demand); } demand = Resources.add(demand, toAdd); } {code} > TestFSAppStarvation fails on trunk > ---------------------------------- > > Key: YARN-6172 > URL: https://issues.apache.org/jira/browse/YARN-6172 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Varun Saxena > Attachments: YARN-6172.000.patch > > > Refer to test report > https://builds.apache.org/job/PreCommit-YARN-Build/14882/testReport/ > {noformat} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation.verifyLeafQueueStarvation(TestFSAppStarvation.java:133) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation.testPreemptionEnabled(TestFSAppStarvation.java:106) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org