[ 
https://issues.apache.org/jira/browse/YARN-11114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528276#comment-17528276
 ] 

Szilard Nemeth commented on YARN-11114:
---------------------------------------

Just checked how this worked before YARN-9879.
Checked out git commit: 25a03bfeced (Before YARN-9879, parent of commit of 
YARN-9879)

*testAppsQueryByQueueShortname*
        runningApp1, queue: default
        runningApp2, queue: root.default
        finishedApp, queue: root.default
        query: default

        runningApp1 is in the result list
        runningApp2 is NOT in the result list
        finishedApp is NOT in the result list
  
*testAppsQueryByQueueFullname*
        runningApp1, queue: default
        runningApp2, queue: root.default
        finishedApp, queue: root.default
        query: root.default

        runningApp1 is NOT in the result list
        runningApp2 is in the result list
        finishedApp is in the result list

Conclusion: Just the exact queue name match of submitted vs. queried queue name 
works.

So, Option 1 above just improves on this as it queries running apps based on 
both queue notations.

I think it's okay to keep Option 1 for now.

> RMWebServices returns only apps matching exactly the submitted queue name
> -------------------------------------------------------------------------
>
>                 Key: YARN-11114
>                 URL: https://issues.apache.org/jira/browse/YARN-11114
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, webapp
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've added 2 testcases that demonstrate the issue with [this 
> commit|https://github.com/szilard-nemeth/hadoop/commit/88dcf40f4dab564477542b8efb82f4f20d132eee].
> 1. With 'testAppsQueryByQueueShortname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default".
> The testcase queries the apps by queue name "default" and the response only 
> contains the runningApp, which is submitted to "default" so the other app 
> that is submitted to "root.default" is not returned.
> 2. With 'testAppsQueryByQueueFullname', there's a finishedApp submitted to 
> "root.default" and there's a runningApp that is submitted to "default" (same 
> setup as above).
> The testcase queries the apps by queue name "root.default" (which is the full 
> queue path) and the response only contains the finishedApp, which is 
> submittted to "root.default" so the other app that is submitted to "default" 
> is not returned.
> A trivial conclusion of this is that only those applications are included in 
> the response that exactly match the queue name where the application is 
> submitted to, either specified explicity at submission or resolved by the 
> placement engine.
> Before YARN-9879 was implemented, Capacity Scheduler was only capable of 
> definining a leaf queue with a specific name in the whole hierarchy once, 
> meaning that leaf queue names were unique.
> For example root.a.testQueue and root.b.testQueue couldn't coexist, as the 
> leaf queue name is the same.
> At this point, I supposed that YARN-9879 is causing this issue, but as the 
> behaviour of CS before YARN-9879 was merged didn't allow two leaf queues with 
> the same name, a query of "root.default" and "default" could easily work as 
> it was guaranteed that there's not another "default" leaf queue in the 
> hierarchy, just one. I digged a bit further.
> I also noticed that YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797])
>  could have introduced this issue a long time ago, as it removed the iterator 
> logic that queried the applications with method YarnScheduler#getAppsInQueue 
> (see 
> [this|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797#diff-5b432bf3a8eb3e039878300ffb9db1f728226b9e3f63c4eb53be5ed5a833390aL843]).
> Let's follow the implementation of YarnScheduler#getAppsInQueue for CS: 
> 1. First of all, 
> [here|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L2501-L2509]
>  is the method definition.
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is called from here.
> 2. 
> [CapacityScheduler#getQueue|https://github.com/apache/hadoop/blob/4c05d257ba3f3311b5bbc993f6e5e35637487d88/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java#L824-L829]
>  is then calling 
> [QueueManager#getQueue|https://github.com/apache/hadoop/blob/da09d68056d4e6a9490ddc6d9ae816b65217e117/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java#L136-L138].
> 3. 
> [QueueManager#getQueue|https://github.com/apache/hadoop/blob/da09d68056d4e6a9490ddc6d9ae816b65217e117/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerQueueManager.java#L136-L138]
>  is then calling [CSQueueStore#get|#get].
> 4. [CSQueueStore#get|#get] calls the 'getMap' fields getOrDefault method 
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueStore.java#L260].
> 4.1 CSQueueStore#getMap (field) stores the Queue objects mapped to their 
> short and full names (e.g. 'default' and 'root.default').
> [CSQueueStore#add|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueStore.java#L122-L152]
>  is the method that is responsible for adding the CSQueue objects.
> 4.2 The first getMap.put call is invoked 
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueStore.java#L134]
>  with the full queue name.
> 4.3 The second getMap.put call is invoked via 
> [CSQueueStore#updateGetMapForShortName|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueStore.java#L102-L120]
>  
> [here|https://github.com/apache/hadoop/blob/d336227e5c63a70db06ac26697994c96ed89d230/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueueStore.java#L113].
> As a conclusion, in 
> [ClientRMService#getApplications|https://github.com/apache/hadoop/blob/d2869940094d330434f3e82d16b1cad3c6023437/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L880-L993],
>  the app filtering by queues seems wrong for me. 
> The block that filters by queues is 
> [here|https://github.com/apache/hadoop/blob/d2869940094d330434f3e82d16b1cad3c6023437/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L915-L918].
> This should be enhanced by querying the apps from 
> YarnScheduler#getAppsInQueue, as it both handles the short and full queue 
> names for CS in the end.
> It's crucial to not just fall back to the logic that was replaced by 
> YARN-8659 ([commit 
> link|https://github.com/apache/hadoop/commit/7c13872cbbb6f1b0b1c2dde894885b41186b3797]).
> As the original issue was there that rmContext.getRMApps() returns both 
> running and finished apps, while scheduler.getAppsInQueue only returns 
> running apps.
> h2. NOTES
> *NOTE #1:* 
> As there's no way to get the short queue name + the full queue name from 
> RmApp / RmAppImpl, it's currently not possible to compare the queue filter of 
> the RM client request with both type of queue names of the application.
> *NOTE #2:*
> scheduler.getAppsInQueue(queue) will only return running apps, so for running 
> apps, it's possible to retrieve the apps by queue name, and it will work with 
> both short and full names. However, for non-running apps, only the submitted 
> app name would work for filtering.
> *NOTE #3 (plan for implementation):*
> It would be completely reasonable to consider both running and non-running 
> apps while querying, however I think it never worked that way.
> Before YARN-8659, only running apps were considered and before YARN-9879, 
> both running + non-running apps were considered but only the stored queue 
> name (in RmAppImpl) was compared to the app filter's queue name, which was 
> either the short or the full queue name.
> All in all, I don't want to change this behavior and also I think it would 
> make the code more convoluted if RmAppImpl would store the short and the full 
> queue names as well.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to