[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17021280#comment-17021280 ]
Gergely Pollak commented on YARN-9879: -------------------------------------- Thank you for your feedback Wilfred Spiegelenburg and Wangda Tan. Originally I tried to keep the getQueueName's behavior, but as I started to investigate it's behavior I've realized we MUST change the way it works. First let's start with a simple question: What is the purpose of the queue's name? Why does it have one, what do we want to use it for? (Ok these are actually 3 questions) As I see in the code the queue name's main purpose is to IDENTIFY a queue, and not just some nice display string. This means the name MUST identify uniquely the queue. Queues are looked up by their name, hence it must be unique or all those references can break. So this is the reason I changed it's behavior to return a unique identifier (the queue's path). Obviously I must check if it breaks anything, and fix it, but allowing multiple leaf queues with the same name is inherently a breaking change. I just try to minimize the impact to change the reference internally to full name everywhere (as you both suggested earlier). About the API breaking. If we have an API which provides us with a queue name, and currently it is a short name, then anyone who uses it to reference to the queue by the provided name will fail in the case of name duplicates. If we return the full name of the queue, then it will still work for them, unless they build on the fact it is just a short name. As long as the queue name is used for queue identification, and not for string operations, it shouldn't cause any problem. Other cases must be identified. This is why I ended up with this approach. This way we change the queue naming once and for all to use full names, and we adjust services which would fail on this change. But we cannot keep the short queue name as reference and have multiple queues with the same name, it's just impossible. This patch will already introduce some changes which can cause issues in already working systems and it might be better to do all invasive changes at once. I can use the getQueuePath (almost) everywhere where we currently using getQueueName, but the result would be the same, with some severe inconsistencies: Using short names would result you being able to get the name of a queue, but you wouldn't be able to get your queue by that very same name from the queue manager. This is just confusing, inconsistent, and not maintenable in my opinion. The quemanager.get(queue.getQueueName()) call can result in NULL or error! (when the queue name is not unique) This is not good practice in my opinion. We need the ambiguous queue list, because we provide a remove method, which can result in a previously ambiguous name becoming ambiguous, and it's much faster to get it from a hashmap O(1), and then check the size of the Set O(1), instead of looking through all queues to see if the collision have been resolved O(n). The short name map has been introduced for the very same reason, when we look up a queue, we just look it up in 2 HashMaps 2 x O(1), instead of iterating through all queue names and splicing the last part for short name O(n). So all in all, I've sacrificed some memory space for a drastic speed increase. O(n) vs O(1) might not seem a huge improvement in the case of a few queues, but considering the queue parse method will make a get call for each queue to check if it is already present in the store, we have a complexity of O(n*n), which IS something to think about. Please help me to think this through one more time with taking my reasons into consideration, thank you. > Allow multiple leaf queues with the same name in CS > --------------------------------------------------- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Gergely Pollak > Assignee: Gergely Pollak > Priority: Major > Attachments: DesignDoc_v1.pdf, YARN-9879.POC001.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org