Craig Condit created YARN-9569:
----------------------------------

             Summary: Auto-created leaf queues do not honor cluster-wide 
min/max memory/vcores
                 Key: YARN-9569
                 URL: https://issues.apache.org/jira/browse/YARN-9569
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: capacity scheduler
    Affects Versions: 3.2.0
            Reporter: Craig Condit


Auto-created leaf queues do not honor cluster-wide settings for maximum 
CPU/vcores allocation.

To reproduce:
 # Set auto-create-child-queue.enabled=true for a parent queue.
 # Set leaf-queue-template.maximum-allocation-mb=16384.
 # Set yarn.resource-types.memory-mb.maximum-allocation=16384 in 
resource-types.xml

 # Launch a YARN app with a container requesting 16 GB RAM.

This scenario should work, but instead you get an error similar to this:

 

{{java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger 
than the cluster setting for queue root.auto.test max allocation per queue: 
<memory:16384, vCores:1> cluster setting: <memory:8192, vCores:4>}}

 

This seems to be caused by this code in ManagedParentQueue.getLeafQueueConfigs:
{code:java}
CapacitySchedulerConfiguration leafQueueConfigTemplate = new
    CapacitySchedulerConfiguration(new Configuration(false), false);{code}
This initializes a new leaf queue configuration that does not read 
resource-types.xml (or any other config). Later, this 
CapacitySchedulerConfiguration instance calls 
ResourceUtils.fetchMaximumAllocationFromConfig()  from its 
getMaximumAllocationPerQueue() method and passes itself as the configuration to 
use. Since the resource types are not present, ResourceUtils falls back to 
compiled-in defaults of 8GB RAM, 4 cores.

 

I was able to work around this with a custom AutoCreatedQueueManagementPolicy 
implementation which does something like this in init() and reinitialize():
{code:java}
for (Map.Entry<String, String> entry : this.scheduler.getConfiguration()) {
if (entry.getKey().startsWith("yarn.resource-types")) {
  parentQueue.getLeafQueueTemplate().getLeafQueueConfigs()
    .set(entry.getKey(), entry.getValue());
  }
}
{code}
However, this is obviously a very hacky way to solve the problem.

I can submit a proper patch if someone can provide some direction as to the 
best way to proceed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to