[ 
https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556449#comment-16556449
 ] 

Robert Kanter commented on YARN-8566:
-------------------------------------

Thanks for the patch.  A few comments:
# In the switch statement, the {{break}}'s should be indented one more level.
# I think we should make the log message and the diagnostic message say the 
same thing for consistency (the only difference would be that the log message 
would also have the App ID and stack trace).
# It looks like {{throwInvalidResourceException}} already has a message with 
details about the problem in it - why not simply push that message to the 
diagnostic message instead of adding {{InvalidResourceType}}?
#- Furthermore, it looks like the exception message is the same, regardless of 
the reason for being invalid, which makes it somewhat unclear (i.e. it says 
"...requested resource type=[X] < 0 or greater than maximum allowed 
allocation." - which doesn't tell you which case).  I'd suggest we make the 
exception message more dynamic based on what the actual problem is, and re-use 
it for the diagnostic message.

> Add diagnostic message for unschedulable containers
> ---------------------------------------------------
>
>                 Key: YARN-8566
>                 URL: https://issues.apache.org/jira/browse/YARN-8566
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-8566.001.patch, YARN-8566.002.patch, 
> YARN-8566.003.patch, YARN-8566.004.patch
>
>
> If a queue is configured with maxResources set to 0 for a resource, and an 
> application is submitted to that queue that requests that resource, that 
> application will remain pending until it is removed or moved to a different 
> queue. This behavior can be realized without extended resources, but it’s 
> unlikely a user will create a queue that allows 0 memory or CPU. As the 
> number of resources in the system increases, this scenario will become more 
> common, and it will become harder to recognize these cases. Therefore, the 
> scheduler should indicate in the diagnostic string for an application if it 
> was not scheduled because of a 0 maxResources setting.
> Example configuration (fair-scheduler.xml) : 
> {code:java}
> <allocations>
>   <queueMaxAppsDefault>100000</queueMaxAppsDefault>
> <queue name="sample_queue">
>     <minResources>10000 mb,2vcores</minResources>
>     <maxResources>90000 mb,4vcores, 0gpu</maxResources>
>     <maxRunningApps>50</maxRunningApps>
>     <maxAMShare>-1.0f</maxAMShare>
>     <weight>2.0</weight>
>     <schedulingPolicy>fair</schedulingPolicy>
>   </queue>
> </allocations>
> {code}
> Command: 
> {code:java}
> yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
> -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
> {code}
> The job hangs and the application diagnostic info is empty.
> Given that an exception is thrown before any mapper/reducer container is 
> created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to