Jonathan Hung created YARN-9992: ----------------------------------- Summary: Max allocation per queue is zero for custom resource types on RM startup Key: YARN-9992 URL: https://issues.apache.org/jira/browse/YARN-9992 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung
Found an issue where trying to request GPUs on a newly booted RM cannot schedule. It throws the exception in SchedulerUtils#throwInvalidResourceException: {noformat} throw new InvalidResourceRequestException( "Invalid resource request, requested resource type=[" + reqResourceName + "] < 0 or greater than maximum allowed allocation. Requested " + "resource=" + reqResource + ", maximum allowed allocation=" + availableResource + ", please note that maximum allowed allocation is calculated " + "by scheduler based on maximum resource of registered " + "NodeManagers, which might be less than configured " + "maximum allocation=" + ResourceUtils.getResourceTypesMaximumAllocation());{noformat} Upon refreshing scheduler (e.g. via refreshQueues), GPU scheduling works again. I think the RC is that upon scheduler refresh, resource-types.xml is loaded in CapacitySchedulerConfiguration (as part of YARN-7738), so when we call ResourceUtils#fetchMaximumAllocationFromConfig in CapacitySchedulerConfiguration#getMaximumAllocationPerQueue, it's able to fetch the {{yarn.resource-types}} config. But resource-types.xml is not loaded into the conf in CapacityScheduler#initScheduler, so it doesn't find the custom resource when computing max allocations, and the custom resource max allocation is 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org