[ https://issues.apache.org/jira/browse/YARN-10868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Song Jiacheng updated YARN-10868: --------------------------------- Description: In FairScheduler, removing a app attempt will call MaxRunningAppsEnforcer#updateRunnabilityOnAppRemoval to find some non-runnable apps and make them not pending. This method will call updateAppsRunnability at the end, and set appsNowMaybeRunnable.size() as the method parameter "maxRunnableApps", as below: {code:java} updateAppsRunnability(appsNowMaybeRunnable, appsNowMaybeRunnable.size()); {code} updateAppsRunnability is below: {code:java} private void updateAppsRunnability(List<List<FSAppAttempt>> appsNowMaybeRunnable, int maxRunnableApps) { // Scan through and check whether this means that any apps are now runnable Iterator<FSAppAttempt> iter = new MultiListStartTimeIterator( appsNowMaybeRunnable); FSAppAttempt prev = null; List<FSAppAttempt> noLongerPendingApps = new ArrayList<FSAppAttempt>(); while (iter.hasNext()) { FSAppAttempt next = iter.next(); if (next == prev) { continue; } if (canAppBeRunnable(next.getQueue(), next)) { trackRunnableApp(next); FSAppAttempt appSched = next; next.getQueue().addApp(appSched, true); noLongerPendingApps.add(appSched); if (noLongerPendingApps.size() >= maxRunnableApps) { break; } } prev = next; } ... {code} maxRunnableApps is the number of apps which can be runnable because of the removal of previous attempts, this method use this parameter to break from the loop. However, nowMaybeRunnable actually is a list of lists, and the size of nowMaybeRunnable is actually a size of queues, so this is a bug. was: In FairScheduler, removing a app attempt will call MaxRunningAppsEnforcer#updateRunnabilityOnAppRemoval to find some non-runnable apps and make them not pending. This method will call updateAppsRunnability at the end, and set appsNowMaybeRunnable.size() as the method parameter "maxRunnableApps", as below: {code:java} updateAppsRunnability(appsNowMaybeRunnable, appsNowMaybeRunnable.size()); {code} updateAppsRunnability is below: {code:java} private void updateAppsRunnability(List<List<FSAppAttempt>> appsNowMaybeRunnable, int maxRunnableApps) { // Scan through and check whether this means that any apps are now runnable Iterator<FSAppAttempt> iter = new MultiListStartTimeIterator( appsNowMaybeRunnable); FSAppAttempt prev = null; List<FSAppAttempt> noLongerPendingApps = new ArrayList<FSAppAttempt>(); while (iter.hasNext()) { FSAppAttempt next = iter.next(); if (next == prev) { continue; } if (canAppBeRunnable(next.getQueue(), next)) { trackRunnableApp(next); FSAppAttempt appSched = next; next.getQueue().addApp(appSched, true); noLongerPendingApps.add(appSched); if (noLongerPendingApps.size() >= maxRunnableApps) { break; } } prev = next; } ... {code} maxRunnableApps is the number of apps which can be runnable because of the removal of previous attempts, but nowMaybeRunnable actually is a list of lists, and the size of nowMaybeRunnable is actually a size of queues, so this is a bug. > FairScheduler: updateAppsRunnability never break > ------------------------------------------------ > > Key: YARN-10868 > URL: https://issues.apache.org/jira/browse/YARN-10868 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 3.2.1 > Reporter: Song Jiacheng > Priority: Major > > In FairScheduler, removing a app attempt will call > MaxRunningAppsEnforcer#updateRunnabilityOnAppRemoval to find some > non-runnable apps and make them not pending. This method will call > updateAppsRunnability at the end, and set appsNowMaybeRunnable.size() as the > method parameter "maxRunnableApps", as below: > {code:java} > updateAppsRunnability(appsNowMaybeRunnable, > appsNowMaybeRunnable.size()); > {code} > updateAppsRunnability is below: > {code:java} > private void updateAppsRunnability(List<List<FSAppAttempt>> > appsNowMaybeRunnable, int maxRunnableApps) { > // Scan through and check whether this means that any apps are now > runnable > Iterator<FSAppAttempt> iter = new MultiListStartTimeIterator( > appsNowMaybeRunnable); > FSAppAttempt prev = null; > List<FSAppAttempt> noLongerPendingApps = new ArrayList<FSAppAttempt>(); > while (iter.hasNext()) { > FSAppAttempt next = iter.next(); > if (next == prev) { > continue; > } > if (canAppBeRunnable(next.getQueue(), next)) { > trackRunnableApp(next); > FSAppAttempt appSched = next; > next.getQueue().addApp(appSched, true); > noLongerPendingApps.add(appSched); > if (noLongerPendingApps.size() >= maxRunnableApps) { > break; > } > } > prev = next; > } > ... > {code} > maxRunnableApps is the number of apps which can be runnable because of the > removal of previous attempts, this method use this parameter to break from > the loop. However, nowMaybeRunnable actually is a list of lists, and the size > of nowMaybeRunnable is actually a size of queues, so this is a bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org