[jira] [Updated] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4606: - Priority: Critical (was: Major) > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan >Priority: Critical > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108183#comment-15108183 ] Wangda Tan commented on YARN-4606: -- Updated description of the JIRA, originally it is found by [~karams] while doing fairness ordering policy tests, pasting original test cases here just for reference: {code} Encountered while studying behaviour fairness with UserLimitPercent and UserLimitFactor during following test: Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 UserLimitFactor=32, FairOrderingPolicy only. Encountered a application starving situation where 33 application (190 apps completed out of 761 apps, queue can 345 containers) are running with total of 45 containers running, and that 12 extra only one app(the app was having around 18000 tasks) , all other apps were having AM running only no other containers were given any apps. After that app finished, there were 32 AMs that kept running without any containers for task being launched GridMix was run with following settings: gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver With Users file containing 4 users for RoundRobinUserResolver {code} > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4606: - Summary: CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps (was: CapacityScheduler: applications could get starved because #activeUsers considers pending apps) > CapacityScheduler: applications could get starved because computation of > #activeUsers considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4606) CapacityScheduler: applications could get starved because #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4606: - Description: Currently, if all applications belong to same user in LeafQueue are pending (caused by max-am-percent, etc.), ActiveUsersManager still considers the user is an active user. This could lead to starvation of active applications, for example: - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to user3)/app4(belongs to user4) are pending - ActiveUsersManager returns #active-users=4 - However, there're only two users (user1/user2) are able to allocate new resources. So computed user-limit-resource could be lower than expected. was: Encountered while studying behaviour fairness with UserLimitPercent and UserLimitFactor during following test: Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 UserLimitFactor=32, FairOrderingPolicy only. Encountered a application starving situation where 33 application (190 apps completed out of 761 apps, queue can 345 containers) are running with total of 45 containers running, and that 12 extra only one app(the app was having around 18000 tasks) , all other apps were having AM running only no other containers were given any apps. After that app finished, there were 32 AMs that kept running without any containers for task being launched GridMix was run with following settings: gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver With Users file containing 4 users for RoundRobinUserResolver > CapacityScheduler: applications could get starved because #activeUsers > considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan > > Currently, if all applications belong to same user in LeafQueue are pending > (caused by max-am-percent, etc.), ActiveUsersManager still considers the user > is an active user. This could lead to starvation of active applications, for > example: > - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to > user3)/app4(belongs to user4) are pending > - ActiveUsersManager returns #active-users=4 > - However, there're only two users (user1/user2) are able to allocate new > resources. So computed user-limit-resource could be lower than expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4606) CapacityScheduler: applications could get starved because #activeUsers considers pending apps
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4606: - Summary: CapacityScheduler: applications could get starved because #activeUsers considers pending apps (was: Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or stuck) > CapacityScheduler: applications could get starved because #activeUsers > considers pending apps > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan > > Encountered while studying behaviour fairness with UserLimitPercent and > UserLimitFactor during following test: > Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 > UserLimitFactor=32, FairOrderingPolicy only. Encountered a application > starving situation where 33 application (190 apps completed out of 761 apps, > queue can 345 containers) are running with total of 45 containers running, > and that 12 extra only one app(the app was having around 18000 tasks) , all > other apps were having AM running only no other containers were given any > apps. After that app finished, there were 32 AMs that kept running without > any containers for task being launched > GridMix was run with following settings: > gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, > gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, > gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, > mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, > gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, > gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver > With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4496: -- Attachment: YARN-4496.2.patch > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > Attachments: YARN-4496.1.patch, YARN-4496.2.patch > > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled
[ https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108079#comment-15108079 ] Hadoop QA commented on YARN-4465: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 2 new + 74 unchanged - 3 fixed = 76 total (was 77) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 28s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 150m 22s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | h
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108048#comment-15108048 ] Sunil G commented on YARN-4479: --- Yes [~leftnoteasy], as mentioned by [~Naganarasimha Garla] , this option came up as a possible solution. However, there were few complexities: For this approach, we Needed a new {{RecoveryComparator}}. This has to be added to {{FifoOrderingPolicy}} also. RecoveryComparator was supposed to run with the information that whether this app was running prior to recovery. So a flag has to be added to FicaSchedulerApp, and then reset the same after first round of activation. Hence more complexities in various part of scheduler was needed for this approach. So a simpler approach is made in LeafQueue. Pls share your thoughts if we missed any in this approach. [~rohithsharma] Could u pls add if I missed any point for this approach. > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108033#comment-15108033 ] Naganarasimha G R commented on YARN-4479: - I think you are referring to the approach similar to the one done in 0002-YARN-4479.patch ? having additional logic in the comparator which checks whether the attempt was wasAttemptRunningEarlier. After discussion we tried to avoid it as unnecessary comparisions happen s even after recovery when comparing each app. If you have any other approach may be we can discuss further > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108032#comment-15108032 ] Hadoop QA commented on YARN-4428: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 37s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 46s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 165m 27s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem
[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108014#comment-15108014 ] Hadoop QA commented on YARN-4496: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 57s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 21s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 42s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 48s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 20s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new + 9 unchanged - 0 fixed = 10 total (was 9) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 29s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 5 new + 221 unchanged - 0 fixed = 226 total (was 221) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 8s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s {color} | {color:green} hadoop-yarn-common in the patch passed
[jira] [Commented] (YARN-4610) Reservations continue looking for one app causes other apps to starve
[ https://issues.apache.org/jira/browse/YARN-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107996#comment-15107996 ] Wangda Tan commented on YARN-4610: -- Thanks [~jlowe], it's a nice finding! +1 to the fix, but could you take a look at failed tests? Not sure if they're related to this fix. > Reservations continue looking for one app causes other apps to starve > - > > Key: YARN-4610 > URL: https://issues.apache.org/jira/browse/YARN-4610 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-4610.001.patch > > > CapacityScheduler's LeafQueue has "reservations continue looking" logic that > allows an application to unreserve elsewhere to fulfil a container request on > a node that has available space. However in 2.7 that logic seems to break > allocations for subsequent apps in the queue. Once a user hits its user > limit, subsequent apps in the queue for other users receive containers at a > significantly reduced rate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4479) Retrospect app-priority in pendingOrderingPolicy during recovering applications
[ https://issues.apache.org/jira/browse/YARN-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107994#comment-15107994 ] Wangda Tan commented on YARN-4479: -- Hi [~rohithsharma], Apologize for my very late feedback. Instead of adding a new list of recovery-and-pending-apps, could we add this behavior (early submitted & running apps goes first) to our existing policy? Maintaining only one ordering policy in LeafQueue is easier. Thoughts? [~jianhe]/[~Naganarasimha]/[~sunilg] > Retrospect app-priority in pendingOrderingPolicy during recovering > applications > --- > > Key: YARN-4479 > URL: https://issues.apache.org/jira/browse/YARN-4479 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Fix For: 2.8.0 > > Attachments: 0001-YARN-4479.patch, 0002-YARN-4479.patch, > 0003-YARN-4479.patch, 0004-YARN-4479.patch, 0004-YARN-4479.patch, > 0005-YARN-4479.patch, 0006-YARN-4479.patch > > > Currently, same ordering policy is used for pending applications and active > applications. When priority is configured for an applications, during > recovery high priority application get activated first. It is possible that > low priority job was submitted and running state. > This causes low priority job in starvation after recovery -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4610) Reservations continue looking for one app causes other apps to starve
[ https://issues.apache.org/jira/browse/YARN-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107959#comment-15107959 ] Hadoop QA commented on YARN-4610: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 55 unchanged - 1 fixed = 56 total (was 56) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 39s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 17s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 188m 9s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | |
[jira] [Commented] (YARN-4612) Fix rumen and scheduler load simulator handle killed tasks properly
[ https://issues.apache.org/jira/browse/YARN-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107949#comment-15107949 ] Hadoop QA commented on YARN-4612: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 52s {color} | {color:red} hadoop-tools-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new + 151 unchanged - 1 fixed = 153 total (was 152) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 7s {color} | {color:red} hadoop-tools-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new + 151 unchanged - 1 fixed = 153 total (was 152) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 16s {color} | {color:green} hadoop-rumen in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s {color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s {color} | {color:green} ha
[jira] [Commented] (YARN-4584) RM startup failure when AM attempts greater than max-attempts
[ https://issues.apache.org/jira/browse/YARN-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107897#comment-15107897 ] Bibin A Chundatt commented on YARN-4584: [~jianhe] /[~rohithsharma]/[~hex108] Could you please review patch attached. For removing AM attempts caused by preemption and diskfailures also we can consider the probability of those cases during the {{attemptFailuresValidityInterval}} AM containers are chosen last for preemption . Also for disk failure case for AM we do have AM blacklisting . So attempts with out failures will be limited rt?. > RM startup failure when AM attempts greater than max-attempts > - > > Key: YARN-4584 > URL: https://issues.apache.org/jira/browse/YARN-4584 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4584.patch, 0002-YARN-4584.patch, > 0003-YARN-4584.patch > > > Configure 3 queue in cluster with 8 GB > # queue 40% > # queue 50% > # default 10% > * Submit applications to all 3 queue with container size as 1024MB (sleep job > with 50 containers on all queues) > * AM that gets assigned to default queue and gets preempted immediately after > 20 preemption kill all application > Due resource limit in default queue AM got prempted about 20 times > On RM restart RM fails to restart > {noformat} > 2016-01-12 10:49:04,081 DEBUG org.apache.hadoop.service.AbstractService: > noteFailure java.lang.NullPointerException > 2016-01-12 10:49:04,081 INFO org.apache.hadoop.service.AbstractService: > Service RMActiveServices failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.recover(RMAppAttemptImpl.java:887) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recover(RMAppImpl.java:826) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:946) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:786) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:464) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1232) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:594) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1022) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1062) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1058) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1705) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1058) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:323) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:127) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:877) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2016-01-12 10:49:04,082 DEBUG org.apache.hadoop.service
[jira] [Updated] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled
[ https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4465: --- Attachment: 0003-YARN-4465.patch > SchedulerUtils#validateRequest for Label check should happen only when > nodelabel enabled > > > Key: YARN-4465 > URL: https://issues.apache.org/jira/browse/YARN-4465 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, > 0003-YARN-4465.patch > > > Disable label from rm side yarn.nodelabel.enable=false > Capacity scheduler label configuration for queue is available as below > default label for queue = b1 as 3 and accessible labels as 1,3 > Submit application to queue A . > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): > Invalid resource request, queue=b1 doesn't have permission to access all > labels in resource request. labelExpression of resource request=3. Queue > labels=1,3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247) > {noformat} > # Ignore default label expression when label is disabled *or* > # NormalizeResourceRequest we can set label expression to > when node label is not enabled *or* > # Improve message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4605) Spelling mistake in the help message of "yarn applicationattempt" command
[ https://issues.apache.org/jira/browse/YARN-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-4605: -- Attachment: YARN-4605.002.patch Trigger a new jenkins job. Those failures were not caused by this patch, most of them are timeouts. > Spelling mistake in the help message of "yarn applicationattempt" command > - > > Key: YARN-4605 > URL: https://issues.apache.org/jira/browse/YARN-4605 > Project: Hadoop YARN > Issue Type: Bug > Components: client, yarn >Affects Versions: 2.4.0 >Reporter: Manjunath Ballur >Assignee: Weiwei Yang >Priority: Trivial > Fix For: 2.8.0 > > Attachments: YARN-4605.001.patch, YARN-4605.002.patch > > > Using YARN CLI, when the user types "yarn applicationattempt", the help > message for the "applicationattempt" command is shown. > Here, the following line has a spelling mistake. "application" is misspelled > as "aplication": > -listList application attempts for aplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4612) Fix rumen and scheduler load simulator handle killed tasks properly
[ https://issues.apache.org/jira/browse/YARN-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-4612: -- Attachment: YARN-4612.patch Here is the draft patch. Also tested it with actual data. > Fix rumen and scheduler load simulator handle killed tasks properly > --- > > Key: YARN-4612 > URL: https://issues.apache.org/jira/browse/YARN-4612 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma > Attachments: YARN-4612.patch > > > Killed tasks might not any attempts. Rumen and SLS throw exceptions when > processing such data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4612) Fix rumen and scheduler load simulator handle killed tasks properly
Ming Ma created YARN-4612: - Summary: Fix rumen and scheduler load simulator handle killed tasks properly Key: YARN-4612 URL: https://issues.apache.org/jira/browse/YARN-4612 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma Killed tasks might not any attempts. Rumen and SLS throw exceptions when processing such data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4428: --- Attachment: YARN-4428.5.patch .5 patch address checkstyle > Redirect RM page to AHS page when AHS turned on and RM page is not avaialable > - > > Key: YARN-4428 > URL: https://issues.apache.org/jira/browse/YARN-4428 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, > YARN-4428.2.2.patch, YARN-4428.2.patch, YARN-4428.3.patch, YARN-4428.3.patch, > YARN-4428.4.patch, YARN-4428.5.patch > > > When AHS is turned on, if we can't view application in RM page, RM page > should redirect us to AHS page. For example, when you go to > cluster/app/application_1, if RM no longer remember the application, we will > simply get "Failed to read the application application_1", but it will be > good for RM ui to smartly try to redirect to AHS ui > /applicationhistory/app/application_1 to see if it's there. The redirect > usage already exist for logs in nodemanager UI. > Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on > fall back of RM not remembering the app. YARN-3975 tried to do this only when > original tracking url is not set. But there are many cases, such as when app > failed at launch, original tracking url will be set to point to RM page, so > redirect to AHS page won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4611) Fix scheduler load simulator to support multi-layer network location
[ https://issues.apache.org/jira/browse/YARN-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107840#comment-15107840 ] Hadoop QA commented on YARN-4611: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 11s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 48s {color} | {color:red} hadoop-sls in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 51s {color} | {color:red} hadoop-sls in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.sls.nodemanager.TestNMSimulator | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.sls.nodemanager.TestNMSimulator | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12783239/YARN-4611.patch | | JIRA Issue | YARN-4611 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5fc93dc65ffa 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/
[jira] [Updated] (YARN-4611) Fix scheduler load simulator to support multi-layer network location
[ https://issues.apache.org/jira/browse/YARN-4611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-4611: -- Attachment: YARN-4611.patch Here is the draft patch. Also tested it with actual rumen trace. > Fix scheduler load simulator to support multi-layer network location > > > Key: YARN-4611 > URL: https://issues.apache.org/jira/browse/YARN-4611 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma > Attachments: YARN-4611.patch > > > SLS assumes the host name's network path has one level, e.g., > /default-rack/hostFoo. It won't work if the rumen trace comes from clusters > with more than one network layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4611) Fix scheduler load simulator to support multi-layer network location
Ming Ma created YARN-4611: - Summary: Fix scheduler load simulator to support multi-layer network location Key: YARN-4611 URL: https://issues.apache.org/jira/browse/YARN-4611 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma SLS assumes the host name's network path has one level, e.g., /default-rack/hostFoo. It won't work if the rumen trace comes from clusters with more than one network layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107775#comment-15107775 ] Hadoop QA commented on YARN-4238: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 18s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 38s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 16s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 19s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 23s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 27s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 19s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 27s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 50s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 22s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 13s {color} | {color:red} root: patch generated 25 new + 522 unchanged - 21 fixed = 547 total (was 543) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 46s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 39s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 40s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 17s {color} | {color:red} hadoop-mapreduce-client-app i
[jira] [Commented] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107721#comment-15107721 ] Hadoop QA commented on YARN-4428: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 41s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 6 new + 128 unchanged - 0 fixed = 134 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 51s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 53s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 148m 26s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoo
[jira] [Updated] (YARN-2575) Consider creating separate ACLs for Reservation create/update/delete/list ops
[ https://issues.apache.org/jira/browse/YARN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-2575: -- Attachment: YARN-2575.v5.patch Subru, thanks for the comments. I implemented the behavior for: "When Reservation ACLs are enabled but not defined". I also made it such that users can always list their own reservations, but users must have either the list reservations ACL or admin ACLto list everyone's reservations. Admin user can already update and delete any reservations. > Consider creating separate ACLs for Reservation create/update/delete/list ops > - > > Key: YARN-2575 > URL: https://issues.apache.org/jira/browse/YARN-2575 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Subru Krishnan >Assignee: Sean Po > Attachments: YARN-2575.v1.patch, YARN-2575.v2.1.patch, > YARN-2575.v2.patch, YARN-2575.v3.patch, YARN-2575.v4.patch, YARN-2575.v5.patch > > > YARN-1051 introduces the ReservationSystem and in the current implementation > anyone who can submit applications can also submit reservations. This JIRA is > to evaluate creating separate ACLs for Reservation create/update/delete ops. > Depends on YARN-4340 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4610) Reservations continue looking for one app causes other apps to starve
[ https://issues.apache.org/jira/browse/YARN-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-4610: - Attachment: YARN-4610.001.patch Patch that resets the amount needed to unreserve at the beginning of canAssignToUser. That way subsequent users in the loop will not accidentally inherit a previous user's amount. A potential workaround until this appears in a release is to set yarn.scheduler.capacity.reservations-continue-look-all-nodes to false in capacity-scheduler.xml. Note that this property is refreshable via yarn rmadmin -refreshQueues, so changing it does not require a restart. After this fix the property should be restored to true to avoid the original issue fixed in YARN-3434. > Reservations continue looking for one app causes other apps to starve > - > > Key: YARN-4610 > URL: https://issues.apache.org/jira/browse/YARN-4610 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-4610.001.patch > > > CapacityScheduler's LeafQueue has "reservations continue looking" logic that > allows an application to unreserve elsewhere to fulfil a container request on > a node that has available space. However in 2.7 that logic seems to break > allocations for subsequent apps in the queue. Once a user hits its user > limit, subsequent apps in the queue for other users receive containers at a > significantly reduced rate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107685#comment-15107685 ] Jian He commented on YARN-4496: --- Uploaded a patch: - added a new RequestHedgingRMFailoverProxyProvider. When client tries to failover, it uses separate proxy object to talk to each RM simultaneously , each proxy retries the RM until the first one receives a response from the active RM. All the other requests are then cancelled. - changed the default rm-retry-interval to be 5 seconds, 30 seconds interval I think is too long. > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > Attachments: YARN-4496.1.patch > > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4496: -- Attachment: YARN-4496.1.patch > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > Attachments: YARN-4496.1.patch > > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4496: -- Attachment: (was: YARN-4496.1.patch) > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4496) Improve HA ResourceManager Failover detection on the client
[ https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-4496: -- Attachment: YARN-4496.1.patch > Improve HA ResourceManager Failover detection on the client > --- > > Key: YARN-4496 > URL: https://issues.apache.org/jira/browse/YARN-4496 > Project: Hadoop YARN > Issue Type: Improvement > Components: client, resourcemanager >Reporter: Arun Suresh >Assignee: Jian He > Attachments: YARN-4496.1.patch > > > HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to > improve Namenode failover detection in the client. It does this by > concurrently trying all namenodes and picks the namenode that returns the > fastest with a successful response as the active node. > It would be useful to have a similar ProxyProvider for the Yarn RM (it can > possibly be done by converging some the class hierarchies to use the same > ProxyProvider) > This would especially be useful for large YARN deployments with multiple > standby RMs where clients will be able to pick the active RM without having > to traverse a list of configured RMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107585#comment-15107585 ] Hadoop QA commented on YARN-4224: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 42s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 55s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 43s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 26s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 8 new + 48 unchanged - 14 fixed = 56 total (was 62) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 22s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not gene
[jira] [Commented] (YARN-4610) Reservations continue looking for one app causes other apps to starve
[ https://issues.apache.org/jira/browse/YARN-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107542#comment-15107542 ] Jason Lowe commented on YARN-4610: -- I believe the issue is in LeafQueue#assignToUser. That method will modify the amount needed to unreserve for a particular user when they hit the resource limit. However the amount needed to unreserve never gets reset to zero for the next iteration of the loop, so subsequent apps for different users can end up not receiving containers because it accidentally thinks it needs to unreserve based on that stale value. > Reservations continue looking for one app causes other apps to starve > - > > Key: YARN-4610 > URL: https://issues.apache.org/jira/browse/YARN-4610 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.1 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > > CapacityScheduler's LeafQueue has "reservations continue looking" logic that > allows an application to unreserve elsewhere to fulfil a container request on > a node that has available space. However in 2.7 that logic seems to break > allocations for subsequent apps in the queue. Once a user hits its user > limit, subsequent apps in the queue for other users receive containers at a > significantly reduced rate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4610) Reservations continue looking for one app causes other apps to starve
Jason Lowe created YARN-4610: Summary: Reservations continue looking for one app causes other apps to starve Key: YARN-4610 URL: https://issues.apache.org/jira/browse/YARN-4610 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.7.1 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Blocker CapacityScheduler's LeafQueue has "reservations continue looking" logic that allows an application to unreserve elsewhere to fulfil a container request on a node that has available space. However in 2.7 that logic seems to break allocations for subsequent apps in the queue. Once a user hits its user limit, subsequent apps in the queue for other users receive containers at a significantly reduced rate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4224: --- Attachment: YARN-4224-feature-YARN-2928.05.patch Uploading patch again to invoke Jenkins. > Support fetching entities by UID and change the REST interface to conform to > current REST APIs' in YARN > --- > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.04.patch, YARN-4224-feature-YARN-2928.05.patch, > YARN-4224-feature-YARN-2928.wip.02.patch, > YARN-4224-feature-YARN-2928.wip.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4224: --- Attachment: (was: YARN-4224-feature-YARN-2928.05.patch) > Support fetching entities by UID and change the REST interface to conform to > current REST APIs' in YARN > --- > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.04.patch, > YARN-4224-feature-YARN-2928.wip.02.patch, > YARN-4224-feature-YARN-2928.wip.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4238: --- Attachment: YARN-4238-feature-YARN-2928.04.patch > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch, > YARN-4238-feature-YARN-2928.04.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4360) Improve GreedyReservationAgent to support "early" allocations, and performance improvements
[ https://issues.apache.org/jira/browse/YARN-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107220#comment-15107220 ] Arun Suresh commented on YARN-4360: --- Thanks for the patch [~curino], couple of minor comments from my first pass: In GreedyReservationAgent, # Change the GREEDY_ALLOCATION_DIRECTION to be boolean instead of string (I am assuming it will always be either left or right). # Use yarnConfiguration.getBoolean() and pass in a default value.. instead of using if.. then.. else In IterativePlanner # Line 174..200, an if block (the {{if(jobtype == ReservationRequestInterpreter.R_ORDER_NO_GAP && ..)}}) exists in both the if(allocateLeft).. else{} branches. You can probably pull that up. # I also noticed that you replace all the "return nulls” with throwing a PlanningException, is that ok ? Ill provide more comments on the {{StageAllocatorGreedyRLE}} after I go thru it in a bit more detail... > Improve GreedyReservationAgent to support "early" allocations, and > performance improvements > > > Key: YARN-4360 > URL: https://issues.apache.org/jira/browse/YARN-4360 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Affects Versions: 2.8.0 >Reporter: Carlo Curino >Assignee: Carlo Curino > Attachments: YARN-4360.2.patch, YARN-4360.3.patch, YARN-4360.5.patch, > YARN-4360.patch > > > The GreedyReservationAgent allocates "as late as possible". Per various > conversations, it seems useful to have a mirror behavior that allocates as > early as possible. Also in the process we leverage improvements from > YARN-4358, and implement an RLE-aware StageAllocatorGreedy(RLE), which > significantly speeds up allocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient
[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107105#comment-15107105 ] Naganarasimha G R commented on YARN-3367: - Thanks for comments [~sjlee0], Working on the patch. Will upload by tomorrow ! > Replace starting a separate thread for post entity with event loop in > TimelineClient > > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367-feature-YARN-2928.003.patch, > YARN-3367-feature-YARN-2928.v1.002.patch, > YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4492) Add documentation for preemption supported in Capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107098#comment-15107098 ] Naganarasimha G R commented on YARN-4492: - Hi [~jlowe], Any other updates required for the patch ? > Add documentation for preemption supported in Capacity scheduler > > > Key: YARN-4492 > URL: https://issues.apache.org/jira/browse/YARN-4492 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: CapacityScheduler.html, YARN-4492-branch-2.7.001.patch, > YARN-4492.v1.001.patch, YARN-4492.v1.002.patch, YARN-4492.v1.003.patch, > YARN-4492.v2.001.patch, YARN-4492.v2.002.patch, YARN-4492.v2.003.patch > > > As part of YARN-2056, Support has been added to disable preemption for a > specific queue. This is a useful feature in a multiload cluster but currently > missing documentation. > Complete preemption is not documented hence update all configurations for > capacity scheduler preemption -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4589) Diagnostics for localization timeouts is lacking
[ https://issues.apache.org/jira/browse/YARN-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107096#comment-15107096 ] Chang Li commented on YARN-4589: [~jlowe] please help review the latest patch. Latest implementation add a new container external state localizing, and in each nodeheartbeat to rm, RMNode maintains and updates states of its container. When RMAppattempt timeout it queries from RMNode about its container state. The implementation also considers backward compatibility > Diagnostics for localization timeouts is lacking > > > Key: YARN-4589 > URL: https://issues.apache.org/jira/browse/YARN-4589 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4589.2.patch, YARN-4589.3.patch, YARN-4589.patch > > > When a container takes too long to localize it manifests as a timeout, and > there's no indication that localization was the issue. We need diagnostics > for timeouts to indicate the container was still localizing when the timeout > occurred. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4428) Redirect RM page to AHS page when AHS turned on and RM page is not avaialable
[ https://issues.apache.org/jira/browse/YARN-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated YARN-4428: --- Attachment: YARN-4428.4.patch Thanks [~jlowe] for review and providing good suggestions! updated .4 patch to also support redirect for appattempt and container. Have successfully manually test them > Redirect RM page to AHS page when AHS turned on and RM page is not avaialable > - > > Key: YARN-4428 > URL: https://issues.apache.org/jira/browse/YARN-4428 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-4428.1.2.patch, YARN-4428.1.patch, > YARN-4428.2.2.patch, YARN-4428.2.patch, YARN-4428.3.patch, YARN-4428.3.patch, > YARN-4428.4.patch > > > When AHS is turned on, if we can't view application in RM page, RM page > should redirect us to AHS page. For example, when you go to > cluster/app/application_1, if RM no longer remember the application, we will > simply get "Failed to read the application application_1", but it will be > good for RM ui to smartly try to redirect to AHS ui > /applicationhistory/app/application_1 to see if it's there. The redirect > usage already exist for logs in nodemanager UI. > Also, when AHS is enabled, WebAppProxyServlet should redirect to AHS page on > fall back of RM not remembering the app. YARN-3975 tried to do this only when > original tracking url is not set. But there are many cases, such as when app > failed at launch, original tracking url will be set to point to RM page, so > redirect to AHS page won't work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107012#comment-15107012 ] Sunil G commented on YARN-4108: --- Thank you [~leftnoteasy] for the updated patch. I applied the patch and ran few local tests.. Looks fine, I will also now go through patch and will share the thoughts. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4108-design-doc-V3.pdf, > YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, > YARN-4108.poc.1.patch, YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch > > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1011) [Umbrella] Schedule containers based on utilization of currently allocated containers
[ https://issues.apache.org/jira/browse/YARN-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107009#comment-15107009 ] Nathan Roberts commented on YARN-1011: -- bq. Welcome any thoughts/suggestions on handling promotion if we allow applications to ask for only guaranteed containers. I ll continue brain-storming. We want to have a simple mechanism, if possible; complex protocols seem to find a way to hoard bugs. I agree that we want something simple and this probably doesn’t qualify, but below are some thoughts anyway. This seems like a difficult problem. Maybe a webex would make sense at some point to go over the design and work through some of these issues Maybe we need to run two schedulers, conceptually anyway. One of them is exactly what we have today, call it the “GUARANTEED” scheduler. The second one is responsible for the “OPPORTUNISTIC” space. What I like about this sort of approach is that we aren’t changing the way the GUARANTEED scheduler would do things. The GUARANTEED scheduler assigns containers in the same order as it always has, regardless of whether or not opportunistic containers are being allocated in the background. By having separate schedulers, we’re not perturbing the way user_limits, capacity limits, reservations, preemption, and other scheduler-specific fairness algorithms deal with opportunistic capacity (I’m concerned we’ll have lots of bugs in this area). The only difference is that the OPPORTUNISTIC side might already be running a container when the GUARANTEED scheduler gets around to the same piece of work (the promotion problem). What I don't like is that it's obviously not simple. - The OPPORTUNISTIC scheduler could behave very differently from the GUARANTEED scheduler (e.g. it could only consider applications in certain queues, it could heavily favor applications with quick running containers, it could randomly select applications to fairly use OPPORTUNISTIC space, it could ignore reservations, it could ignore user limits, it could work extra hard to get good container locality, etc.) - When the OPPORTUNISTIC scheduler launches a container, it modifies the ask to indicate this portion has been launched opportunistically, the size of the ask does not change (this means the application needs to be aware that it is launching an OPPORTUNISTIC container) - Like Bikas already mentioned, we have to promote opportunistic containers, even if it means shooting an opportunistic one and launching a guaranteed one somewhere else. - If the GUARANTEED scheduler decides to assign a container y to a portion of an ask that has already been opportunistically launched with container x, the AM is asked to migrate container x to container y. If x and y are on the same host, great, the AM asks the NM to convert x to y (mostly bookkeeping); if not the AM kills x and launches y. Probably need a new state to track the migration. - Maybe locality would make the killing of opportunistic containers a rare event? If both schedulers are working hard to get locality (e.g. YARN-80 gets us to about 80% node local), then it seems like the GUARANTEED scheduler is going to usually pick the same nodes as the OPPORTUNISTIC scheduler, resulting in very simple container conversions with no lost work. - I don’t see how we can get away from occasionally shooting an opportunistic container so that a guaranteed one can run somewhere else. Given that we want opportunistic space to be used for both SLA and non-SLA work, we can’t wait around for a low priority opportunistic container on a busy node. Ideally the OPPORTUNISTIC scheduler would be good at picking containers that almost never get shot. - When the GUARANTEED scheduler assigns a container to a node, the over-allocate thresholds could be violated, in this case OPPORTUNISTIC containers on the node need to be shot. It would be good if this didn’t happen if a simple conversion was going to occur anyway. Given the complexities of this problem, we're going to experiment with a simpler approach of over-allocating up-to 2-3X on memory with the NM shooting containers (preemptable containers first) when resources are dangerously low. The over-allocate will be dynamic based on current node usage (when node is idle, no over-allocate; basically there has to be some evidence that over-allocating will be successful before we actually over-allocate). This type of approach might not satisfy all use cases but it might turn out to be very simple and mostly effective. We'll report back on how this type of approach works out. > [Umbrella] Schedule containers based on utilization of currently allocated > containers > - > > Key: YARN-1011 > URL: https://issues.apache.org/jira/browse/YARN-1011 > Project
[jira] [Updated] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3215: Attachment: YARN-3215.v2.002.patch Hi [~wangda], Have corrected the approach as per the discussion we had and also have corrected the test cases , checkstyle and the find bugs issue reported ! Can you please review the latest patch > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, > YARN-3215.v2.002.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-4577: Attachment: YARN-4577.20160119.1.patch > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch, > YARN-4577.20160119.1.patch, YARN-4577.3.patch, YARN-4577.3.rebase.patch, > YARN-4577.4.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4238: --- Attachment: (was: YARN-4238-feature-YARN-2928.02.patch) > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.002.patch, YARN-4238-feature-YARN-2928.003.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4224: --- Attachment: YARN-4224-feature-YARN-2928.05.patch > Support fetching entities by UID and change the REST interface to conform to > current REST APIs' in YARN > --- > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.04.patch, YARN-4224-feature-YARN-2928.05.patch, > YARN-4224-feature-YARN-2928.wip.02.patch, > YARN-4224-feature-YARN-2928.wip.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4224) Support fetching entities by UID and change the REST interface to conform to current REST APIs' in YARN
[ https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106785#comment-15106785 ] Varun Saxena commented on YARN-4224: Fixed one of the checkstyle issues. Others cant be as they are related to param number and imports due to javadoc. > Support fetching entities by UID and change the REST interface to conform to > current REST APIs' in YARN > --- > > Key: YARN-4224 > URL: https://issues.apache.org/jira/browse/YARN-4224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4224-YARN-2928.01.patch, > YARN-4224-feature-YARN-2928.04.patch, YARN-4224-feature-YARN-2928.05.patch, > YARN-4224-feature-YARN-2928.wip.02.patch, > YARN-4224-feature-YARN-2928.wip.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4605) Spelling mistake in the help message of "yarn applicationattempt" command
[ https://issues.apache.org/jira/browse/YARN-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106751#comment-15106751 ] Hadoop QA commented on YARN-4605: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 55s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 2s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 56s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 25s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 17s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 19s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 29s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 27s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 29s {color} | {color:red} hadoop-yarn-client in the patch failed wi
[jira] [Updated] (YARN-4609) RM Nodes list page takes too much time to load
[ https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4609: --- Description: Configure SLS with 1 NM Nodes Check the time taken to load Nodes page For loading 10 k Nodes it takes *30 sec* /cluster/nodes Chrome :Version 47.0.2526.106 m was: Configure SLS with 1 NM Nodes Check the time taken to load Nodes page For loading 10 k Nodes it takes *30 sec* Chrome :Version 47.0.2526.106 m > RM Nodes list page takes too much time to load > -- > > Key: YARN-4609 > URL: https://issues.apache.org/jira/browse/YARN-4609 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Configure SLS with 1 NM Nodes > Check the time taken to load Nodes page > For loading 10 k Nodes it takes *30 sec* > /cluster/nodes > Chrome :Version 47.0.2526.106 m -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4609) RM Nodes list page takes too much time to load
Bibin A Chundatt created YARN-4609: -- Summary: RM Nodes list page takes too much time to load Key: YARN-4609 URL: https://issues.apache.org/jira/browse/YARN-4609 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Configure SLS with 1 NM Nodes Check the time taken to load Nodes page For loading 10 k Nodes it takes *30 sec* Chrome :Version 47.0.2526.106 m -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4608) Redundant code statement in WritingYarnApplications
[ https://issues.apache.org/jira/browse/YARN-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106675#comment-15106675 ] Kai Sasaki commented on YARN-4608: -- I removed the redundant statement in code example and fix some typos. > Redundant code statement in WritingYarnApplications > --- > > Key: YARN-4608 > URL: https://issues.apache.org/jira/browse/YARN-4608 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Minor > Labels: documentation > Attachments: YARN-4608.01.patch > > > There is redundant statement application master section in > {{WritingYarnApplications}}. > {code} > List previousAMRunningContainers = > response.getContainersFromPreviousAttempts(); > List previousAMRunningContainers = > response.getContainersFromPreviousAttempts(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4608) Redundant code statement in WritingYarnApplications
[ https://issues.apache.org/jira/browse/YARN-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated YARN-4608: - Attachment: YARN-4608.01.patch > Redundant code statement in WritingYarnApplications > --- > > Key: YARN-4608 > URL: https://issues.apache.org/jira/browse/YARN-4608 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Reporter: Kai Sasaki >Assignee: Kai Sasaki >Priority: Minor > Labels: documentation > Attachments: YARN-4608.01.patch > > > There is redundant statement application master section in > {{WritingYarnApplications}}. > {code} > List previousAMRunningContainers = > response.getContainersFromPreviousAttempts(); > List previousAMRunningContainers = > response.getContainersFromPreviousAttempts(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4608) Redundant code statement in WritingYarnApplications
Kai Sasaki created YARN-4608: Summary: Redundant code statement in WritingYarnApplications Key: YARN-4608 URL: https://issues.apache.org/jira/browse/YARN-4608 Project: Hadoop YARN Issue Type: Bug Components: documentation Reporter: Kai Sasaki Assignee: Kai Sasaki Priority: Minor There is redundant statement application master section in {{WritingYarnApplications}}. {code} List previousAMRunningContainers = response.getContainersFromPreviousAttempts(); List previousAMRunningContainers = response.getContainersFromPreviousAttempts(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4607) AppAttempt page TotalOutstandingResource Requests table support pagination
Bibin A Chundatt created YARN-4607: -- Summary: AppAttempt page TotalOutstandingResource Requests table support pagination Key: YARN-4607 URL: https://issues.apache.org/jira/browse/YARN-4607 Project: Hadoop YARN Issue Type: Improvement Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Simulate cluster with 10 racks with 100 nodes using sls and of we check the table for Total Outstanding Resource Requests will consume complete page. Good to support pagination for the table -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4363) In TestFairScheduler, testcase should not create FairScheduler redundantly
[ https://issues.apache.org/jira/browse/YARN-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Jie reassigned YARN-4363: - Assignee: Tao Jie > In TestFairScheduler, testcase should not create FairScheduler redundantly > -- > > Key: YARN-4363 > URL: https://issues.apache.org/jira/browse/YARN-4363 > Project: Hadoop YARN > Issue Type: Test > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Tao Jie >Assignee: Tao Jie >Priority: Trivial > Attachments: YARN-4363.001.patch > > > I am trying to make some improvement on fairscheduler, but get some test > failure on TestFairScheduler, due to redundant FairScheduler creation: > In TestFairScheduler, FairScheduler and RM is created, then set RMContext of > RM to scheduler. > {code} > @Before > public void setUp() throws IOException { > scheduler = new FairScheduler(); > conf = createConfiguration(); > resourceManager = new MockRM(conf); > scheduler.setRMContext(resourceManager.getRMContext()); > } > {code} > However in several case, scheduler is renewed, as a result RMcontext in > scheduler is null. > {code} > @Test > public void testMinZeroResourcesSettings() throws IOException { > scheduler = new FairScheduler(); > YarnConfiguration conf = new YarnConfiguration(); > ... > scheduler.init(conf); > {code} > Then do scheduler.init(conf), I get a NPE(I try to get something from > RMContext in scheduler initialization). > So FairScheduler should not be renewed in test block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106519#comment-15106519 ] Wangda Tan commented on YARN-4606: -- Thanks [~karams], assigned it to me. > Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor > in queue leads to situation where it appears that applications in queue are > getting starved or stuck > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan > > Encountered while studying behaviour fairness with UserLimitPercent and > UserLimitFactor during following test: > Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 > UserLimitFactor=32, FairOrderingPolicy only. Encountered a application > starving situation where 33 application (190 apps completed out of 761 apps, > queue can 345 containers) are running with total of 45 containers running, > and that 12 extra only one app(the app was having around 18000 tasks) , all > other apps were having AM running only no other containers were given any > apps. After that app finished, there were 32 AMs that kept running without > any containers for task being launched > GridMix was run with following settings: > gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, > gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, > gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, > mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, > gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, > gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver > With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4559) Make leader elector and zk store share the same curator client
[ https://issues.apache.org/jira/browse/YARN-4559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106516#comment-15106516 ] Hadoop QA commented on YARN-4559: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 40s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped branch modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 27s {color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 5 new + 105 unchanged - 1 fixed = 110 total (was 106) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patch modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 50s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 17s {color} | {color:red} hadoop-yarn in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 16s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {
[jira] [Assigned] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved o
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-4606: Assignee: Wangda Tan > Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor > in queue leads to situation where it appears that applications in queue are > getting starved or stuck > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh >Assignee: Wangda Tan > > Encountered while studying behaviour fairness with UserLimitPercent and > UserLimitFactor during following test: > Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 > UserLimitFactor=32, FairOrderingPolicy only. Encountered a application > starving situation where 33 application (190 apps completed out of 761 apps, > queue can 345 containers) are running with total of 45 containers running, > and that 12 extra only one app(the app was having around 18000 tasks) , all > other apps were having AM running only no other containers were given any > apps. After that app finished, there were 32 AMs that kept running without > any containers for task being launched > GridMix was run with following settings: > gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, > gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, > gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, > mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, > gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, > gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver > With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved
[ https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106507#comment-15106507 ] Karam Singh commented on YARN-4606: --- >From offline discussion with [~wangda]: After looked at log & code, I think I understand what happened: The root cause is: we shouldn't activate application when it's in pending state. This is not a new issue, at least branch-2.6 contains this issue. This leads to #active-users in a queue increased, but new added active user cannot get resource (because application is in pending state) and old user hits user-limit (new added user lowers user-limits). > Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor > in queue leads to situation where it appears that applications in queue are > getting starved or stuck > - > > Key: YARN-4606 > URL: https://issues.apache.org/jira/browse/YARN-4606 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Affects Versions: 2.8.0, 2.7.1 >Reporter: Karam Singh > > Encountered while studying behaviour fairness with UserLimitPercent and > UserLimitFactor during following test: > Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 > UserLimitFactor=32, FairOrderingPolicy only. Encountered a application > starving situation where 33 application (190 apps completed out of 761 apps, > queue can 345 containers) are running with total of 45 containers running, > and that 12 extra only one app(the app was having around 18000 tasks) , all > other apps were having AM running only no other containers were given any > apps. After that app finished, there were 32 AMs that kept running without > any containers for task being launched > GridMix was run with following settings: > gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, > gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, > gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, > mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, > gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, > gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver > With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4606) Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or
Karam Singh created YARN-4606: - Summary: Sometimes Fairness inconjuncttions with UserLimitPercent and UserLimitFactor in queue leads to situation where it appears that applications in queue are getting starved or stuck Key: YARN-4606 URL: https://issues.apache.org/jira/browse/YARN-4606 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, capacityscheduler Affects Versions: 2.7.1, 2.8.0 Reporter: Karam Singh Encountered while studying behaviour fairness with UserLimitPercent and UserLimitFactor during following test: Ran GridMix with Queue settings: Capacity=10, MaxCap=80, UserLimit=25 UserLimitFactor=32, FairOrderingPolicy only. Encountered a application starving situation where 33 application (190 apps completed out of 761 apps, queue can 345 containers) are running with total of 45 containers running, and that 12 extra only one app(the app was having around 18000 tasks) , all other apps were having AM running only no other containers were given any apps. After that app finished, there were 32 AMs that kept running without any containers for task being launched GridMix was run with following settings: gridmix.client.pending.queue.depth=10, gridmix.job-submission.policy=REPLAY, gridmix.client.submit.threads=5, gridmix.submit.multiplier=0.0001, gridmix.job.type=SLEEPJOB, mapreduce.framework.name=yarn, mapreduce.job.queuename=hive1, mapred.job.queue.name=hive1, gridmix.sleep.max-map-time=5000, gridmix.sleep.max-reduce-time=5000, gridmix.user.resolve.class=org.apache.hadoop.mapred.gridmix.RoundRobinUserResolver With Users file containing 4 users for RoundRobinUserResolver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106490#comment-15106490 ] Naganarasimha G R commented on YARN-3215: - {{TestCapacityScheduler.testApplicationHeadRoom}} Test case failure is related to the patch, looking into it ! > Respect labels in CapacityScheduler when computing headroom > --- > > Key: YARN-3215 > URL: https://issues.apache.org/jira/browse/YARN-3215 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-3215.v1.001.patch, YARN-3215.v2.001.patch > > > In existing CapacityScheduler, when computing headroom of an application, it > will only consider "non-labeled" nodes of this application. > But it is possible the application is asking for labeled resources, so > headroom-by-label (like 5G resource available under node-label=red) is > required to get better resource allocation and avoid deadlocks such as > MAPREDUCE-5928. > This JIRA could involve both API changes (such as adding a > label-to-available-resource map in AllocateResponse) and also internal > changes in CapacityScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4557) Improper Queues sorting in PartitionedQueueComparator when accessible node labels is configured as ANY
[ https://issues.apache.org/jira/browse/YARN-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106472#comment-15106472 ] Hadoop QA commented on YARN-4557: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 9s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 164m 9s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12783012/YARN-4557.v3.002.patch | | JIRA Issue | YARN-4557 | | Opti
[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission
[ https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106437#comment-15106437 ] Bibin A Chundatt commented on YARN-3940: Hi [~leftnoteasy], Could you please review patch attached. > Application moveToQueue should check NodeLabel permission > -- > > Key: YARN-3940 > URL: https://issues.apache.org/jira/browse/YARN-3940 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-3940.patch, 0002-YARN-3940.patch, > 0003-YARN-3940.patch, 0004-YARN-3940.patch, 0005-YARN-3940.patch, > 0006-YARN-3940.patch > > > Configure capacity scheduler > Configure node label an submit application {{queue=A Label=X}} > Move application to queue {{B}} and x is not having access > {code} > 2015-07-20 19:46:19,626 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application attempt appattempt_1437385548409_0005_01 released container > container_e08_1437385548409_0005_01_02 on node: host: > host-10-19-92-117:64318 #containers=1 available= > used= with event: KILL > 2015-07-20 19:46:20,970 WARN > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Invalid resource ask by application appattempt_1437385548409_0005_01 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, queue=b1 doesn't have permission to access all labels in > resource request. labelExpression of resource request=x. Queue labels=y > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250) > at > org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2174) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2170) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2168) > {code} > Same exception will be thrown till *heartbeat timeout* > Then application state will be updated to *FAILED* -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled
[ https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106435#comment-15106435 ] Bibin A Chundatt commented on YARN-4465: [~leftnoteasy] Thank you for reviewing patch. Uploaded latest patch after correcttion > SchedulerUtils#validateRequest for Label check should happen only when > nodelabel enabled > > > Key: YARN-4465 > URL: https://issues.apache.org/jira/browse/YARN-4465 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch > > > Disable label from rm side yarn.nodelabel.enable=false > Capacity scheduler label configuration for queue is available as below > default label for queue = b1 as 3 and accessible labels as 1,3 > Submit application to queue A . > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): > Invalid resource request, queue=b1 doesn't have permission to access all > labels in resource request. labelExpression of resource request=3. Queue > labels=1,3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247) > {noformat} > # Ignore default label expression when label is disabled *or* > # NormalizeResourceRequest we can set label expression to > when node label is not enabled *or* > # Improve message -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled
[ https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4465: --- Attachment: 0002-YARN-4465.patch > SchedulerUtils#validateRequest for Label check should happen only when > nodelabel enabled > > > Key: YARN-4465 > URL: https://issues.apache.org/jira/browse/YARN-4465 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch > > > Disable label from rm side yarn.nodelabel.enable=false > Capacity scheduler label configuration for queue is available as below > default label for queue = b1 as 3 and accessible labels as 1,3 > Submit application to queue A . > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): > Invalid resource request, queue=b1 doesn't have permission to access all > labels in resource request. labelExpression of resource request=3. Queue > labels=1,3 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247) > {noformat} > # Ignore default label expression when label is disabled *or* > # NormalizeResourceRequest we can set label expression to > when node label is not enabled *or* > # Improve message -- This message was sent by Atlassian JIRA (v6.3.4#6332)