[ 
https://issues.apache.org/jira/browse/YARN-10458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222860#comment-17222860
 ] 

Peter Bacsko edited comment on YARN-10458 at 10/29/20, 12:24 PM:
-----------------------------------------------------------------

[~wangda] I created a test case with MockRM and MockNM but I have a little bit 
of a problem. For some reason, the submitted application doesn't reach 
ALLOCATED state, it' stuck in SCHEDULED. I tried to dig deeper but got confused 
about all kinds of resource calculations.

The access check passes, so there's no problem there, but why can't the 
application start?

Here is the testcase which I added to {{TestCapacityScheduler.java}}

{noformat}
  import org.apache.hadoop.yarn.api.records.QueueACL;
 ...
  @Test
  public void testAccessCheckOfNonExistingDynamicQueueWithTags()
      throws Exception {
    CapacitySchedulerConfiguration csConf
      = new CapacitySchedulerConfiguration();
    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
        new String[] {"a", "b"});
    csConf.setCapacity("root.a", 90);
    csConf.setCapacity("root.b", 10);
    csConf.set("yarn.scheduler.capacity.resource-calculator",
        "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator");
    csConf.setAutoCreateChildQueueEnabled("root.a", true);
    csConf.setAutoCreatedLeafQueueConfigCapacity("root.a", 50);
    csConf.setAutoCreatedLeafQueueConfigMaxCapacity("root.a", 100);
    
csConf.set(CapacitySchedulerConfiguration.MAXIMUM_APPLICATION_MASTERS_RESOURCE_PERCENT,
        "0.5");
    csConf.setAcl("root.a", QueueACL.ADMINISTER_QUEUE, "*");
    csConf.setAcl("root.a", QueueACL.SUBMIT_APPLICATIONS, "*");
    csConf.setBoolean(YarnConfiguration
        .APPLICATION_TAG_BASED_PLACEMENT_ENABLED, true);
    csConf.setStrings(YarnConfiguration
        .APPLICATION_TAG_BASED_PLACEMENT_USER_WHITELIST, "hadoop");
    csConf.set(CapacitySchedulerConfiguration.QUEUE_MAPPING, 
"u:%user:root.a.%user");
    csConf.setInt("yarn.scheduler.minimum-allocation-mb", 1024);
    csConf.setInt("yarn.scheduler.minimum-allocation-vcores", 1);

    YarnConfiguration conf=new YarnConfiguration(csConf);
    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
        ResourceScheduler.class);
    RMNodeLabelsManager mgr=new NullRMNodeLabelsManager();
    mgr.init(conf);
    MockRM rm = new MockRM(csConf);
    rm.getRMContext().setNodeLabelManager(mgr);
    rm.start();
    MockNM nm = rm.registerNode("127.0.0.1:1234", 16 * GB);

    MockRMAppSubmissionData data =
        MockRMAppSubmissionData.Builder.createWithMemory(GB, rm)
            .withAppName("apptodynamicqueue")
            .withUser("hadoop")
            .withAcls(null)
            .withUnmanagedAM(false)
            .withApplicationTags(Sets.newHashSet("userid=testuser"))
            .build();
    RMApp app = MockRMAppSubmitter.submit(rm, data);
    nm.nodeHeartbeat(true);
    MockRM.launchAndRegisterAM(app, rm, nm); // stuck in SCHEDULED state
  }
{noformat}

As you can see, the mapped queue becomes "root.a.testuser" and it gets created 
but can't run applications:

{noformat}
2020-10-29 11:38:57,334 DEBUG [AsyncDispatcher event handler] 
capacity.ParentQueue (ParentQueue.java:printChildQueues(861)) - 
printChildQueues - queue: root.a child-queues: 
root.a.testuserusedCapacity=(0.0),  label=(*)
2020-10-29 11:38:57,335 DEBUG [AsyncDispatcher event handler] 
capacity.ParentQueue (ParentQueue.java:assignContainersToChildQueues(799)) - 
Trying to assign to queue: root.a.testuser stats: root.a.testuser: 
capacity=0.5, absoluteCapacity=0.45, usedResources=<memory:0, vCores:0>, 
usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0, 
effectiveMinResource=<memory:0, vCores:0> , effectiveMaxResource=<memory:0, 
vCores:0>
2020-10-29 11:38:57,335 DEBUG [AsyncDispatcher event handler] 
capacity.LeafQueue (LeafQueue.java:assignContainers(1129)) - assignContainers: 
partition= #applications=1
2020-10-29 11:38:57,339 DEBUG [AsyncDispatcher event handler] 
capacity.AbstractCSQueue (AbstractCSQueue.java:canAssignToThisQueue(1113)) - 
Failed to assign to queue: root.a.testuser nodePatrition: , usedResources: 
<memory:0, vCores:0>, clusterResources: <memory:32768, vCores:32>, 
reservedResources: <memory:0, vCores:0>, maxLimitCapacity: <memory:0, 
vCores:0>, currTotalUsed:<memory:0, vCores:0>
{noformat}

I'm totally exhausted by this. Maybe the solution is obvious but I just can't 
see it. The issue seems to be with effectiveMaxResource which is always 
{{<memory:0, vcores:0>}} and a comparison fails in 
{{AbstractCSQueue.java:canAssignToThisQueue()}}. 


was (Author: pbacsko):
[~wangda] I created a test case with MockRM and MockNM but I have a little bit 
of a problem. For some reason, the submitted application doesn't reach RUNNING 
state, it' stuck in SCHEDULED. I tried to dig deeper but got confused about all 
kinds of resource calculations.

The access check passes, so there's no problem there, but why can't the 
application start?

Here is the testcase which I added to {{TestCapacityScheduler.java}}

{noformat}
  import org.apache.hadoop.yarn.api.records.QueueACL;
 ...
  @Test
  public void testAccessCheckOfNonExistingDynamicQueueWithTags()
      throws Exception {
    CapacitySchedulerConfiguration csConf
      = new CapacitySchedulerConfiguration();
    csConf.setQueues(CapacitySchedulerConfiguration.ROOT,
        new String[] {"a", "b"});
    csConf.setCapacity("root.a", 90);
    csConf.setCapacity("root.b", 10);
    csConf.set("yarn.scheduler.capacity.resource-calculator",
        "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator");
    csConf.setAutoCreateChildQueueEnabled("root.a", true);
    csConf.setAutoCreatedLeafQueueConfigCapacity("root.a", 50);
    csConf.setAutoCreatedLeafQueueConfigMaxCapacity("root.a", 100);
    
csConf.set(CapacitySchedulerConfiguration.MAXIMUM_APPLICATION_MASTERS_RESOURCE_PERCENT,
        "0.5");
    csConf.setAcl("root.a", QueueACL.ADMINISTER_QUEUE, "*");
    csConf.setAcl("root.a", QueueACL.SUBMIT_APPLICATIONS, "*");
    csConf.setBoolean(YarnConfiguration
        .APPLICATION_TAG_BASED_PLACEMENT_ENABLED, true);
    csConf.setStrings(YarnConfiguration
        .APPLICATION_TAG_BASED_PLACEMENT_USER_WHITELIST, "hadoop");
    csConf.set(CapacitySchedulerConfiguration.QUEUE_MAPPING, 
"u:%user:root.a.%user");
    csConf.setInt("yarn.scheduler.minimum-allocation-mb", 1024);
    csConf.setInt("yarn.scheduler.minimum-allocation-vcores", 1);

    YarnConfiguration conf=new YarnConfiguration(csConf);
    conf.setClass(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class,
        ResourceScheduler.class);
    RMNodeLabelsManager mgr=new NullRMNodeLabelsManager();
    mgr.init(conf);
    MockRM rm = new MockRM(csConf);
    rm.getRMContext().setNodeLabelManager(mgr);
    rm.start();
    MockNM nm = rm.registerNode("127.0.0.1:1234", 16 * GB);

    MockRMAppSubmissionData data =
        MockRMAppSubmissionData.Builder.createWithMemory(GB, rm)
            .withAppName("apptodynamicqueue")
            .withUser("hadoop")
            .withAcls(null)
            .withUnmanagedAM(false)
            .withApplicationTags(Sets.newHashSet("userid=testuser"))
            .build();
    RMApp app = MockRMAppSubmitter.submit(rm, data);
    nm.nodeHeartbeat(true);
    MockRM.launchAndRegisterAM(app, rm, nm); // stuck in SCHEDULED state
  }
{noformat}

As you can see, the mapped queue becomes "root.a.testuser" and it gets created 
but can't run applications.

> Hive On Tez queries fails upon submission to dynamically created pools
> ----------------------------------------------------------------------
>
>                 Key: YARN-10458
>                 URL: https://issues.apache.org/jira/browse/YARN-10458
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Anand Srinivasan
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: YARN-10458-001.patch, YARN-10458-002.patch
>
>
> While using Dynamic Auto-Creation and Management of Leaf Queues, we could see 
> that the queue creation fails because ACL submit application check couldn't 
> succeed.
> We tried setting acl_submit_applications to '*' for managed parent queues. 
> For static queues, this worked but failed for dynamic queues. Also tried 
> setting the below property but it didn't help either.
> yarn.scheduler.capacity.root.parent-queue-name.leaf-queue-template.acl_submit_applications=*.
> RM error log shows the following :
> 2020-09-18 01:08:40,579 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1600399068816_0460 user user1 mapping [default] to 
> [queue1] override false
> 2020-09-18 01:08:40,579 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: User 'user1' from 
> application tag does not have access to  queue 'user1'. The placement is done 
> for user 'hive'
>  
> Checking the code, scheduler#checkAccess() bails out even before checking the 
> ACL permissions for that particular queue because the CSQueue is null.
> {code:java}
> public boolean checkAccess(UserGroupInformation callerUGI,
> QueueACL acl, String queueName) {
> CSQueue queue = getQueue(queueName);
> if (queue == null) {
> if (LOG.isDebugEnabled())
> { LOG.debug("ACL not found for queue access-type " + acl + " for queue " + 
> queueName); }
> return false;                    *<-- the method returns false here.*
> }
> return queue.hasAccess(acl, callerUGI);
> }
> {code}
> As this is an auto created queue, CSQueue may be null in this case. May be 
> scheduler#checkAccess() should have a logic to differentiate when CSQueue is 
> null and if queue mapping is involved and if so, check if the parent queue 
> exists and is a managed parent and if so, check if the parent queue has valid 
> ACL's instead of returning false ?
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to