[ https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Hung updated YARN-9205: -------------------------------- Attachment: YARN-9205-branch-2.001.patch > When using custom resource type, application will fail to run due to the > CapacityScheduler throws > InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION) > --------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-9205 > URL: https://issues.apache.org/jira/browse/YARN-9205 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: Zhankun Tang > Assignee: Zhankun Tang > Priority: Critical > Fix For: 3.1.2, 3.3.0, 3.2.1 > > Attachments: YARN-9205-branch-2.001.patch, > YARN-9205-branch-3.1.001.patch, YARN-9205-branch-3.2.001.patch, > YARN-9205-trunk.001.patch, YARN-9205-trunk.002.patch, > YARN-9205-trunk.003.patch, YARN-9205-trunk.004.patch, > YARN-9205-trunk.005.patch, YARN-9205-trunk.006.patch, > YARN-9205-trunk.007.patch, YARN-9205-trunk.008.patch, > YARN-9205-trunk.009.patch > > > In a non-secure cluster. Reproduce it as follows: > # Set capacity scheduler in yarn-site.xml > # Use default capacity-scheduler.xml > # Set custom resource type "cmp.com/hdw" in resource-types.xml > # Set a value say 10 in node-resources.xml > # Start cluster > # Submit a distribute shell application which requests some "cmp.com/hdw" > The AM will get an exception from CapacityScheduler and then failed. This bug > doesn't exist in FairScheduler. > {code:java} > 2019-01-17 22:12:11,286 INFO distributedshell.ApplicationMaster: Requested > container ask: Capability[<memory:2048, vCores:2, cmp.com/hdw: > 2>]Priority[0]AllocationRequestId[0]ExecutionTypeRequest[{Execution Type: > GUARANTEED, Enforce Execution Type: false}]Resource Profile[] > 2019-01-17 22:12:12,326 ERROR impl.AMRMClientAsyncImpl: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request! Cannot allocate containers as requested resource is greater > than maximum allowed allocation. Requested resource type=[cmp.com/hdw], > Requested resource=<memory:2048, vCores:2, cmp.com/hdw: 2>, maximum allowed > allocation=<memory:8192, vCores:4>, please note that maximum allowed > allocation is calculated by scheduler based on maximum resource of registered > NodeManagers, which might be less than configured maximum > allocation=<memory:8192, vCores:4, cmp.com/hdw: 9223372036854775807> > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:315) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:301) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > ...{code} > Did a roughly debugging, below method return the wrong maximum capacity. > DefaultAMSProcessor.java, Line 234. > {code:java} > Resource maximumCapacity = > getScheduler().getMaximumResourceCapability(app.getQueue());{code} > The above code seems should return "<memory:8192, vCores:4, cmp.com/hdw:10>" > but returns "<memory:8192, vCores:4>". > This incorrect value might be caused by queue maximum allocation calculation > involved in YARN-8720: > AbstractCSQueue.java Line364 > {code:java} > this.maximumAllocation = > configuration.getMaximumAllocationPerQueue( > getQueuePath());{code} > And this invokes CapacitySchedulerConfiguration.java Line 895: > {code:java} > Resource clusterMax = ResourceUtils.fetchMaximumAllocationFromConfig(this); > {code} > Passing a "this" which is not a YarnConfiguration instance will cause below > code return null for resource names and then only contains mandatory > resources. This might be the root cause. > {code:java} > private static Map<String, ResourceInformation> > getResourceInformationMapFromConfig( > ... > // NULL value here! > String[] resourceNames = conf.getStrings(YarnConfiguration.RESOURCE_TYPES); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org