[ https://issues.apache.org/jira/browse/YARN-8566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553917#comment-16553917 ]
Antal Bálint Steinbach edited comment on YARN-8566 at 7/25/18 6:50 AM: ----------------------------------------------------------------------- Hi [~snemeth] +1 LGTM (Non-binding) Thanks for the fix. was (Author: bsteinbach): Hi [~snemeth] +1 Thanks for the fix. > Add diagnostic message for unschedulable containers > --------------------------------------------------- > > Key: YARN-8566 > URL: https://issues.apache.org/jira/browse/YARN-8566 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth > Priority: Major > Attachments: YARN-8566.001.patch, YARN-8566.002.patch, > YARN-8566.003.patch, YARN-8566.004.patch > > > If a queue is configured with maxResources set to 0 for a resource, and an > application is submitted to that queue that requests that resource, that > application will remain pending until it is removed or moved to a different > queue. This behavior can be realized without extended resources, but it’s > unlikely a user will create a queue that allows 0 memory or CPU. As the > number of resources in the system increases, this scenario will become more > common, and it will become harder to recognize these cases. Therefore, the > scheduler should indicate in the diagnostic string for an application if it > was not scheduled because of a 0 maxResources setting. > Example configuration (fair-scheduler.xml) : > {code:java} > <allocations> > <queueMaxAppsDefault>100000</queueMaxAppsDefault> > <queue name="sample_queue"> > <minResources>10000 mb,2vcores</minResources> > <maxResources>90000 mb,4vcores, 0gpu</maxResources> > <maxRunningApps>50</maxRunningApps> > <maxAMShare>-1.0f</maxAMShare> > <weight>2.0</weight> > <schedulingPolicy>fair</schedulingPolicy> > </queue> > </allocations> > {code} > Command: > {code:java} > yarn jar > "./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi > -Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000; > {code} > The job hangs and the application diagnostic info is empty. > Given that an exception is thrown before any mapper/reducer container is > created, the diagnostic message of the AM should be updated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org