[ https://issues.apache.org/jira/browse/YARN-9992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986454#comment-16986454 ]
Eric Payne commented on YARN-9992: ---------------------------------- [~jhung], it looks like this is only a problem on branch-2 and branch-2.10. Is that your analysis as well? > Max allocation per queue is zero for custom resource types on RM startup > ------------------------------------------------------------------------ > > Key: YARN-9992 > URL: https://issues.apache.org/jira/browse/YARN-9992 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Jonathan Hung > Assignee: Jonathan Hung > Priority: Major > Attachments: YARN-9992.001.patch > > > Found an issue where trying to request GPUs on a newly booted RM cannot > schedule. It throws the exception in > SchedulerUtils#throwInvalidResourceException: > {noformat} > throw new InvalidResourceRequestException( > "Invalid resource request, requested resource type=[" + reqResourceName > + "] < 0 or greater than maximum allowed allocation. Requested " > + "resource=" + reqResource + ", maximum allowed allocation=" > + availableResource > + ", please note that maximum allowed allocation is calculated " > + "by scheduler based on maximum resource of registered " > + "NodeManagers, which might be less than configured " > + "maximum allocation=" > + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} > Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works > again. > I think the RC is that upon scheduler refresh, resource-types.xml is loaded > in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call > ResourceUtils#fetchMaximumAllocationFromConfig in > CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to > fetch the {{yarn.resource-types}} config. But resource-types.xml is not > loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find > the custom resource when computing max allocations, and the custom resource > max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org