[ 
https://issues.apache.org/jira/browse/YARN-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693266#comment-16693266
 ] 

Szilard Nemeth commented on YARN-8951:
--------------------------------------

Thanks [~wilfreds] for your comment!
I debugged the test code a bit more, it turned out that the call of 
scheduler.init() eventually calls FS.initScheduler() so in terms of this, the 
scheduler is properly initialized.
With our further offline debugging together, we realized the following: 
1. When {{AllocationFileLoaderService#reloadAllocations}} gets called, it 
creates the Queue placement policy with calling 
{{getQueuePlacementPolicy(allocationFileParser, queueProperties, conf)}}, then 
that calls {{QueuePlacementPolicy.fromXml()}} and eventually creates the 
QueuePlacementPolicy object. In 
{{AllocationFileLoaderService#getQueuePlacementPolicy}}, the configured queues 
are passed in with {{queueProperties.getConfiguredQueues()}}, which means it's 
just the config file.
So the queues are coming from the config file, regardless what the 
{{QueueManager}} has. In other words, {{QueuePlacementPolicy}} has a separate 
(and different) set of queues that the {{QueueManager}} has. 
This could cause several issues.
As [~wilfreds] said, the code changes possibly involve to fix it pretty much in 
common with YARN-7769 so this is getting on hold until that issue is fixed.

> Defining default queue placement rule in allocations file with create="false" 
> throws an NPE
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-8951
>                 URL: https://issues.apache.org/jira/browse/YARN-8951
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: default-placement-rule-with-create-false.patch
>
>
> If the default queue placement rule is defined with {{create="false"}} and a 
> scheduling request is created for queue {{"root.default"}}, then 
> {{FairScheduler#assignToQueue}} throws an NPE, while trying to construct an 
> error message in the catch block of {{IllegalStateException}}, relying on the 
> fact that the {{rmApp}} is not null but it is.
> Example of such a config file:
> {code:java}
> <?xml version="1.0"?>
> <allocations>
>       <queue name="parentq" type="parent">
>               <minResources>1024mb,0vcores</minResources>
>       </queue>
>       <queuePlacementPolicy>
>               <rule name="default" create="false"/>
>       </queuePlacementPolicy>
> </allocations>
> {code}
> This is suspicious, as there are some null checks for {{rmApp}} in the same 
> method.
>  Not sure if this is a special case for the tests or it is reproducable in a 
> cluster, this needs further investigation.
> In any case, it's not good that we try to dereference the {{rmApp}} that is 
> null.
> On the other hand, I'm not sure if the default queue placement rule with 
> {{create="false"}} makes sense at all. Looking at the documentation 
> ([https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html):]
> {quote}default: the app is placed into the queue specified in the ‘queue’ 
> attribute of the default rule. *If ‘queue’ attribute is not specified, the 
> app is placed into ‘root.default’ queue.*
> A queuePlacementPolicy element: which contains a list of rule elements that 
> tell the scheduler how to place incoming apps into queues. Rules are applied 
> in the order that they are listed. Rules may take arguments. *All rules 
> accept the “create” argument, which indicates whether the rule can create a 
> new queue. “Create” defaults to true; if set to false and the rule would 
> place the app in a queue that is not configured in the allocations file, we 
> continue on to the next rule.* The last rule must be one that can never issue 
> a continue....
> {quote}
> In this case, the rule has the queue property suppressed so the apps should 
> be placed to the {{root.default}} queue (which is an undefined queue 
> according to the config file), and create is false, meaning that the queue 
> {{root.default}} cannot be created at all.
> *This seems to be a case of an invalid queue configuration file for me.*
> [~jlowe], [~leftnoteasy]: What is your take on this?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to