[jira] [Commented] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189720#comment-14189720 ] Tsuyoshi OZAWA commented on YARN-2712: -- [~adhoot] [~kkambatl] [~jianhe] do you have additional comments? > Adding tests about FSQueue and headroom of FairScheduler to > TestWorkPreservingRMRestart > --- > > Key: YARN-2712 > URL: https://issues.apache.org/jira/browse/YARN-2712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2712.1.patch, YARN-2712.2.patch > > > TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases > about FairScheduler partially. We should support them. > {code} >// Until YARN-1959 is resolved >if (scheduler.getClass() != FairScheduler.class) { > assertEquals(availableResources, schedulerAttempt.getHeadroom()); >} > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189707#comment-14189707 ] Hadoop QA commented on YARN-2753: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678125/YARN-2753.005.patch against trunk revision 0126cf1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5637//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5637//console This message is automatically generated. > Fix potential issues and code clean up for *NodeLabelsManager > - > > Key: YARN-2753 > URL: https://issues.apache.org/jira/browse/YARN-2753 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2753.000.patch, YARN-2753.001.patch, > YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, > YARN-2753.005.patch > > > Issues include: > * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value > in labelCollections if the key already exists otherwise the Label.resource > will be changed(reset). > * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of > CommonNodeLabelsManager. > ** because when a Node is created, Node.labels can be null. > ** In this case, nm.labels; may be null. So we need check originalLabels not > null before use it(originalLabels.containsAll). > * addToCluserNodeLabels should be protected by writeLock in > RMNodeLabelsManager.java. because we should protect labelCollections in > RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
[ https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189698#comment-14189698 ] Karthik Kambatla commented on YARN-2588: Agree, we will need to move things around a little to get it right. > Standby RM does not transitionToActive if previous transitionToActive is > failed with ZK exception. > -- > > Key: YARN-2588 > URL: https://issues.apache.org/jira/browse/YARN-2588 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0, 2.6.0, 2.5.1 >Reporter: Rohith >Assignee: Rohith > Fix For: 2.6.0 > > Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch > > > Consider scenario where, StandBy RM is failed to transition to Active because > of ZK exception(connectionLoss or SessionExpired). Then any further > transition to Active for same RM does not move RM to Active state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM
[ https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189657#comment-14189657 ] Hadoop QA commented on YARN-2770: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678042/YARN-2770.1.patch against trunk revision 0126cf1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5635//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5635//console This message is automatically generated. > Timeline delegation tokens need to be automatically renewed by the RM > - > > Key: YARN-2770 > URL: https://issues.apache.org/jira/browse/YARN-2770 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.5.0 >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-2770.1.patch > > > YarnClient will automatically grab a timeline DT for the application and pass > it to the app AM. Now the timeline DT renew is still dummy. If an app is > running for more than 24h (default DT expiry time), the app AM is no longer > able to use the expired DT to communicate with the timeline server. Since RM > will cache the credentials of each app, and renew the DTs for the running > app. We should provider renew hooks similar to what HDFS DT has for RM, and > set RM user as the renewer when grabbing the timeline DT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: YARN-2753.005.patch > Fix potential issues and code clean up for *NodeLabelsManager > - > > Key: YARN-2753 > URL: https://issues.apache.org/jira/browse/YARN-2753 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2753.000.patch, YARN-2753.001.patch, > YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, > YARN-2753.005.patch > > > Issues include: > * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value > in labelCollections if the key already exists otherwise the Label.resource > will be changed(reset). > * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of > CommonNodeLabelsManager. > ** because when a Node is created, Node.labels can be null. > ** In this case, nm.labels; may be null. So we need check originalLabels not > null before use it(originalLabels.containsAll). > * addToCluserNodeLabels should be protected by writeLock in > RMNodeLabelsManager.java. because we should protect labelCollections in > RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: (was: YARN-2753.005.patch) > Fix potential issues and code clean up for *NodeLabelsManager > - > > Key: YARN-2753 > URL: https://issues.apache.org/jira/browse/YARN-2753 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2753.000.patch, YARN-2753.001.patch, > YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, > YARN-2753.005.patch > > > Issues include: > * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value > in labelCollections if the key already exists otherwise the Label.resource > will be changed(reset). > * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of > CommonNodeLabelsManager. > ** because when a Node is created, Node.labels can be null. > ** In this case, nm.labels; may be null. So we need check originalLabels not > null before use it(originalLabels.containsAll). > * addToCluserNodeLabels should be protected by writeLock in > RMNodeLabelsManager.java. because we should protect labelCollections in > RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189652#comment-14189652 ] Hadoop QA commented on YARN-2772: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678074/YARN-2772.1.patch against trunk revision 0126cf1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5636//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5636//console This message is automatically generated. > DistributedShell's timeline related options are not clear > - > > Key: YARN-2772 > URL: https://issues.apache.org/jira/browse/YARN-2772 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-2772.1.patch > > > The new options "domain" and "create" options - they are not descriptive at > all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
[ https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189600#comment-14189600 ] Rohith commented on YARN-2588: -- bq. but I would like for us to call transitionToStandby in the catch-block instead of explicitly calling the contents of transitionToStandby As I understand the comment, Is expecting change is like below..? CMIIAW, If yes, transitionToStandby return in intial state check itself. And end up in without creating active services and resetting dispatcher!!! {code} try { startActiveServices(); return null; } catch (Exception e) { transitionToStandby(true); throw e; } {code} > Standby RM does not transitionToActive if previous transitionToActive is > failed with ZK exception. > -- > > Key: YARN-2588 > URL: https://issues.apache.org/jira/browse/YARN-2588 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0, 2.6.0, 2.5.1 >Reporter: Rohith >Assignee: Rohith > Fix For: 2.6.0 > > Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch > > > Consider scenario where, StandBy RM is failed to transition to Active because > of ZK exception(connectionLoss or SessionExpired). Then any further > transition to Active for same RM does not move RM to Active state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189583#comment-14189583 ] Rohith commented on YARN-2579: -- Thanks Karthink!! bq. (Service)Dispatcher.stop() wait for draining out RMFatalEventDispatcher event I was meant to say that drained event i.e RMFatalEvent is been waiting to be finished at {{rmDispatcher.stop()}} in {{eventHandlerThread.join}}. bq. {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a try-catch block {{dispatch(event)}} method catch throwable and exit the JVM. But I see if handler's are not registered , then we must have try-catch block. do you meant for this scenario? bq. {{eventHandlerThread.join}} in serviceStop should take a timeout as well +1 for this approach too, this also fixes hang problem. The attached patch too does not bring Rm to hang in a kind of deadlock mode. bq. With the current patch, I wonder if there are any unexpected side-effects I have verified many switching scenarios as I mentioned in previous comment and more deployed in real cluster. It is working fine with work preserving restart too. > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith >Priority: Blocker > Attachments: YARN-2579.patch, YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189488#comment-14189488 ] Hadoop QA commented on YARN-2698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677981/YARN-2698-20141029-2.patch against trunk revision 6f5f604. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.TestLargeSort org.apache.hadoop.mapred.TestClusterMRNotification The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5634//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5634//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5634//console This message is automatically generated. > Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of > RMAdminCLI > --- > > Key: YARN-2698 > URL: https://issues.apache.org/jira/browse/YARN-2698 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, > YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, > YARN-2698-20141029-2.patch > > > YARN RMAdminCLI and AdminService should have write API only, for other read > APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned YARN-2604: --- Assignee: Robert Kanter (was: Karthik Kambatla) > Scheduler should consider max-allocation-* in conjunction with the largest > node > --- > > Key: YARN-2604 > URL: https://issues.apache.org/jira/browse/YARN-2604 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Affects Versions: 2.5.1 >Reporter: Karthik Kambatla >Assignee: Robert Kanter > > If the scheduler max-allocation-* values are larger than the resources > available on the largest node in the cluster, an application requesting > resources between the two values will be accepted by the scheduler but the > requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2774) shared cache uploader service should authorize notify calls properly
[ https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2774: -- Issue Type: Sub-task (was: Task) Parent: YARN-1492 > shared cache uploader service should authorize notify calls properly > > > Key: YARN-2774 > URL: https://issues.apache.org/jira/browse/YARN-2774 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sangjin Lee > > The shared cache manager (SCM) uploader service (done in YARN-2186) currently > does not authorize calls to notify the SCM on newly uploaded resource. Proper > security/authorization needs to be done in this RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2774) shared cache uploader service should authorize notify calls properly
Sangjin Lee created YARN-2774: - Summary: shared cache uploader service should authorize notify calls properly Key: YARN-2774 URL: https://issues.apache.org/jira/browse/YARN-2774 Project: Hadoop YARN Issue Type: Task Reporter: Sangjin Lee The shared cache manager (SCM) uploader service (done in YARN-2186) currently does not authorize calls to notify the SCM on newly uploaded resource. Proper security/authorization needs to be done in this RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2772: -- Attachment: YARN-2772.1.patch > DistributedShell's timeline related options are not clear > - > > Key: YARN-2772 > URL: https://issues.apache.org/jira/browse/YARN-2772 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-2772.1.patch > > > The new options "domain" and "create" options - they are not descriptive at > all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189384#comment-14189384 ] Zhijie Shen commented on YARN-2772: --- [~vinodkv], thanks for your proposal. 1. I prefer "create_timeline_domain" over "should_create_timeline_domain", as it is an option without arg. So there will not be true/false for it. 2. I'd like to enforce the validation logic (see the existing code comment). However, as we're lacking timeline client query APIs. It will involve more steps to send http requests and parse JSON response. I prefer to do it after YARN-2423. {code} try { //TODO: we need to check and combine the existing timeline domain ACLs, //but let's do it once we have client java library to query domains. TimelineDomain domain = new TimelineDomain(); {code} Otherwise, I've addressed the other comments and made a patch. > DistributedShell's timeline related options are not clear > - > > Key: YARN-2772 > URL: https://issues.apache.org/jira/browse/YARN-2772 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > The new options "domain" and "create" options - they are not descriptive at > all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
[ https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189344#comment-14189344 ] Karthik Kambatla commented on YARN-2588: Thanks Jian for pointing me to this. Patch fixes an important issue, but I would like for us to call transitionToStandby in the catch-block instead of explicitly calling the contents of transitionToStandby. I ll fix this up in YARN-2010. > Standby RM does not transitionToActive if previous transitionToActive is > failed with ZK exception. > -- > > Key: YARN-2588 > URL: https://issues.apache.org/jira/browse/YARN-2588 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 3.0.0, 2.6.0, 2.5.1 >Reporter: Rohith >Assignee: Rohith > Fix For: 2.6.0 > > Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch > > > Consider scenario where, StandBy RM is failed to transition to Active because > of ZK exception(connectionLoss or SessionExpired). Then any further > transition to Active for same RM does not move RM to Active state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2186) Node Manager uploader service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189307#comment-14189307 ] Karthik Kambatla commented on YARN-2186: Thanks Sangjin. Looks mostly good, but for some minor comments: # How about renaming NMUploaderSerivceSCMProtocol to SharedCacheUploader (after ResourceTracker) or SharedCacheUploaderProtocol? Accordingly, rename all other related classes and proto files? # Instead of {{yarn.sharedcache.nodemanager.}}, we should probably call it {{yarn.sharedcache.uploader}} to avoid confusion? # As per our offline discussions, it would be nice to add a way for the NM to ask the SCM whether it should upload a resource to the shared-cache or not. For now, this could be always yes. In the future, we can add a pluggable policy that the SCM would consult to answer the NM. # NMCacheUploaderSCMProtocolPBClientImpl#close should set {{this.proxy}} to null after calling stopProxy. # NMCacheUploaderSCMProtocolService: ## TODOs should have an associated follow-up JIRA and reference in the code so we don't forget ## serviceStop should set {{this.server}} to null after calling {{this.server.stop()}} > Node Manager uploader service for cache manager > --- > > Key: YARN-2186 > URL: https://issues.apache.org/jira/browse/YARN-2186 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, > YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch > > > Implement the node manager uploader service for the cache manager. This > service is responsible for communicating with the node manager when it > uploads resources to the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2771) DistributedShell's DSConstants are badly named
[ https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189300#comment-14189300 ] Hadoop QA commented on YARN-2771: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678058/YARN-2771.1.patch against trunk revision 6f5f604. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5633//console This message is automatically generated. > DistributedShell's DSConstants are badly named > -- > > Key: YARN-2771 > URL: https://issues.apache.org/jira/browse/YARN-2771 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-2771.1.patch > > > I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of > DISTRIBUTEDSHELLTIMELINEDOMAIN). > DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to > be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? > For the old envs, we can just add new envs that point to the old-one and > deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named
[ https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2771: -- Attachment: YARN-2771.1.patch While I was aware of the bad naming, I decided to follow the pattern of the existing constants in DSConstants to be consistent. Anyway, I've uploaded a patch to fix all these constants. DS is not a serious computation framework, the env var name change is transparent to the CLI user, hence it should not breaking anything. > DistributedShell's DSConstants are badly named > -- > > Key: YARN-2771 > URL: https://issues.apache.org/jira/browse/YARN-2771 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > Attachments: YARN-2771.1.patch > > > I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of > DISTRIBUTEDSHELLTIMELINEDOMAIN). > DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to > be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? > For the old envs, we can just add new envs that point to the old-one and > deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189247#comment-14189247 ] Hadoop QA commented on YARN-2766: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678034/YARN-2766.patch against trunk revision 3ae84e1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5632//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5632//console This message is automatically generated. > ApplicationHistoryManager is expected to return a sorted list of > apps/attempts/containers > -- > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2766: -- Summary: ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers (was: [JDK 8] TestApplicationHistoryClientService fails) > ApplicationHistoryManager is expected to return a sorted list of > apps/attempts/containers > -- > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2579: --- Priority: Blocker (was: Major) Target Version/s: 2.6.0 > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith >Priority: Blocker > Attachments: YARN-2579.patch, YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2766: -- Issue Type: Sub-task (was: Bug) Parent: YARN-321 > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2766: -- Issue Type: Bug (was: Sub-task) Parent: (was: YARN-1530) > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189195#comment-14189195 ] Karthik Kambatla commented on YARN-2579: Thanks, [~rohithsharma]. Looking at the tests and your explanation, I think I see what you are saying. However, looking into the code, I am not convinced it is draining out that is causing this issue. {{rmDispatcher}} is an {{AsyncDispatcher}}, with {{drainEventsOnStop}} always false. So, {{rmDispatcher.stop()}} shouldn't lead to any draining of events. I noticed a couple of other issues in the AsyncDispatcher code: # {{eventHandlerThread.join}} in serviceStop should take a timeout as well # {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a try-catch block With the current patch, I wonder if there are any unexpected side-effects. > Both RM's state is Active , but 1 RM is not really active. > -- > > Key: YARN-2579 > URL: https://issues.apache.org/jira/browse/YARN-2579 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.1 >Reporter: Rohith >Assignee: Rohith > Attachments: YARN-2579.patch, YARN-2579.patch > > > I encountered a situaltion where both RM's web page was able to access and > its state displayed as Active. But One of the RM's ActiveServices were > stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189194#comment-14189194 ] Zhijie Shen commented on YARN-2766: --- I think we need to change ApplicationContext -> ApplicationHistoryManager -> ApplicationHistoryManagerOnTimelineStore. Modifying the protobuf message will not help the web services. > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM
[ https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2770: -- Attachment: YARN-2770.1.patch Created a patch: * Add two timeline client APIs - renew/cancel delegation token * Make TimelineDelegationTokenIdentifier.Renewer extend TokenRenewer and implement renew and cancel logic by using timeline client APIs * Change YarnClientImpl to set the renewer of the timeline DT to the user of RM daemon. * Add the test cases to validate renew/cancel APIs * Have done end-to-end test to verify that the automatic DT renew works in a secure cluster. > Timeline delegation tokens need to be automatically renewed by the RM > - > > Key: YARN-2770 > URL: https://issues.apache.org/jira/browse/YARN-2770 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.5.0 >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-2770.1.patch > > > YarnClient will automatically grab a timeline DT for the application and pass > it to the app AM. Now the timeline DT renew is still dummy. If an app is > running for more than 24h (default DT expiry time), the app AM is no longer > able to use the expired DT to communicate with the timeline server. Since RM > will cache the credentials of each app, and renew the DTs for the running > app. We should provider renew hooks similar to what HDFS DT has for RM, and > set RM user as the renewer when grabbing the timeline DT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2766: Attachment: YARN-2766.patch New patch fixes findbugs warnings > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189143#comment-14189143 ] Hadoop QA commented on YARN-2556: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678020/yarn2556.patch against trunk revision d33e07d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5631//console This message is automatically generated. > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: chang li > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189117#comment-14189117 ] Allen Wittenauer commented on YARN-2701: OK, this compiled without incident, so I'm +1 now. Thanks! > Potential race condition in startLocalizer when using LinuxContainerExecutor > -- > > Key: YARN-2701 > URL: https://issues.apache.org/jira/browse/YARN-2701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, > YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, > YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, > YARN-2701.addendum.3.patch, YARN-2701.addendum.4.patch > > > When using LinuxContainerExecutor do startLocalizer, we are using native code > container-executor.c. > {code} > if (stat(npath, &sb) != 0) { >if (mkdir(npath, perm) != 0) { > {code} > We are using check and create method to create the appDir under /usercache. > But if there are two containers trying to do this at the same time, race > condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2556: --- Attachment: yarn2556.patch Cleaned up my patch, welcome to review. I have used this application to test the timeline server throughput on local mode by launching 4 mappers and each will put an entity larger than 100 kbs and iterate for 1000 times. Here is my measure result, on my local machine, the timeline server can provide about 10Mbs io rate for write. There is some deviation from the write throughput for leveldb. People are welcome to try this tool and comment about it. > Tool to measure the performance of the timeline server > -- > > Key: YARN-2556 > URL: https://issues.apache.org/jira/browse/YARN-2556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: chang li > Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, > yarn2556.patch, yarn2556_wip.patch > > > We need to be able to understand the capacity model for the timeline server > to give users the tools they need to deploy a timeline server with the > correct capacity. > I propose we create a mapreduce job that can measure timeline server write > and read performance. Transactions per second, I/O for both read and write > would be a good start. > This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189101#comment-14189101 ] Hadoop QA commented on YARN-2755: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678008/YARN-2755.v4.patch against trunk revision d33e07d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5630//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5630//console This message is automatically generated. > NM fails to clean up usercache_DEL_ dirs after YARN-661 > -- > > Key: YARN-2755 > URL: https://issues.apache.org/jira/browse/YARN-2755 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, > YARN-2755.v3.patch, YARN-2755.v4.patch > > > When NM restarts frequently due to some reason, a large number of directories > like these left in /data/disk$num/yarn/local/: > /data/disk1/yarn/local/usercache_DEL_1414372756105 > /data/disk1/yarn/local/usercache_DEL_1413557901696 > /data/disk1/yarn/local/usercache_DEL_1413657004894 > /data/disk1/yarn/local/usercache_DEL_1413675321860 > /data/disk1/yarn/local/usercache_DEL_1414093167936 > /data/disk1/yarn/local/usercache_DEL_1413565841271 > These directories are empty, but take up 100M+ due to the number of them. > There were 38714 on the machine I looked at per data disk. > It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189088#comment-14189088 ] Hadoop QA commented on YARN-2766: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678001/YARN-2766.patch against trunk revision d33e07d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5629//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5629//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5629//console This message is automatically generated. > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2773) ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem
Anubhav Dhoot created YARN-2773: --- Summary: ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem Key: YARN-2773 URL: https://issues.apache.org/jira/browse/YARN-2773 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Priority: Minor Reservation system requires use the ReservationDefinition to use a queue name to choose which reservation queue is being used. CapacityScheduler does not allow duplicate leaf queue names. Because of this we can refer to a unique leaf queue by simply using its name and not full path (which includes parentName + "."). FairScheduler allows duplicate leaf queue names because of which one needs to refer to the full queue name to identify a queue uniquely. This is inconsistent for the implementation of the AbstractReservationSystem where one implementation of getQueuePath will do conversion (CapacityReservationSystem) while the FairReservationSystem will return the same value back -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189026#comment-14189026 ] Siqi Li commented on YARN-2755: --- Thanks for you feedback [~jlowe]. I have updated the patch with proper fix > NM fails to clean up usercache_DEL_ dirs after YARN-661 > -- > > Key: YARN-2755 > URL: https://issues.apache.org/jira/browse/YARN-2755 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, > YARN-2755.v3.patch, YARN-2755.v4.patch > > > When NM restarts frequently due to some reason, a large number of directories > like these left in /data/disk$num/yarn/local/: > /data/disk1/yarn/local/usercache_DEL_1414372756105 > /data/disk1/yarn/local/usercache_DEL_1413557901696 > /data/disk1/yarn/local/usercache_DEL_1413657004894 > /data/disk1/yarn/local/usercache_DEL_1413675321860 > /data/disk1/yarn/local/usercache_DEL_1414093167936 > /data/disk1/yarn/local/usercache_DEL_1413565841271 > These directories are empty, but take up 100M+ due to the number of them. > There were 38714 on the machine I looked at per data disk. > It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: YARN-2755.v4.patch > NM fails to clean up usercache_DEL_ dirs after YARN-661 > -- > > Key: YARN-2755 > URL: https://issues.apache.org/jira/browse/YARN-2755 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, > YARN-2755.v3.patch, YARN-2755.v4.patch > > > When NM restarts frequently due to some reason, a large number of directories > like these left in /data/disk$num/yarn/local/: > /data/disk1/yarn/local/usercache_DEL_1414372756105 > /data/disk1/yarn/local/usercache_DEL_1413557901696 > /data/disk1/yarn/local/usercache_DEL_1413657004894 > /data/disk1/yarn/local/usercache_DEL_1413675321860 > /data/disk1/yarn/local/usercache_DEL_1414093167936 > /data/disk1/yarn/local/usercache_DEL_1413565841271 > These directories are empty, but take up 100M+ due to the number of them. > There were 38714 on the machine I looked at per data disk. > It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189015#comment-14189015 ] Jason Lowe commented on YARN-2755: -- Thanks for the patch, Siqi. userDirStatus can be null if userDirPath is not a directory, so we should avoid the potential NPE and check for {{userDirStatus != null && userDirStatus.hasNext()}} > NM fails to clean up usercache_DEL_ dirs after YARN-661 > -- > > Key: YARN-2755 > URL: https://issues.apache.org/jira/browse/YARN-2755 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, > YARN-2755.v3.patch > > > When NM restarts frequently due to some reason, a large number of directories > like these left in /data/disk$num/yarn/local/: > /data/disk1/yarn/local/usercache_DEL_1414372756105 > /data/disk1/yarn/local/usercache_DEL_1413557901696 > /data/disk1/yarn/local/usercache_DEL_1413657004894 > /data/disk1/yarn/local/usercache_DEL_1413675321860 > /data/disk1/yarn/local/usercache_DEL_1414093167936 > /data/disk1/yarn/local/usercache_DEL_1413565841271 > These directories are empty, but take up 100M+ due to the number of them. > There were 38714 on the machine I looked at per data disk. > It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2766: Attachment: YARN-2766.patch That makes sense. I wasn't able to trace the code back to ApplicationHistoryManager, but I did find where the lists are created, so I put the sorting calls there. > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch, YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188969#comment-14188969 ] Vinod Kumar Vavilapalli commented on YARN-2772: --- I propose the following: - Rename the "domain" and "create" options to be "timeline_domain_id" and "should_create_timeline_domain" respectively. - Modify option description of view_acls and modify_acls to say that they are only needed if should_create_timeline_domain is true - Modify description of {{timeline_domain_id}} to say that it is optional and the it will use the "DEFAULT" timeline-domain by default - If {{should_create_timeline_domain}} is off, we should validate on the client to see if the domain really exists or not and fail the submission if not with a message saying "The passed timeline-domain doesn't exist. Either pass an existing timeline-domain_id or set should_create_timeline_domain to true". - If {{should_create_timeline_domain}} is on, and the user passes an existing timeline-domain-id, we should fail the submission and say "The passed timeline-domain already exists. Either pass an new timeline-domain_id or set should_create_timeline_domain to false" > DistributedShell's timeline related options are not clear > - > > Key: YARN-2772 > URL: https://issues.apache.org/jira/browse/YARN-2772 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > The new options "domain" and "create" options - they are not descriptive at > all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2772) DistributedShell's timeline related options are not clear
Vinod Kumar Vavilapalli created YARN-2772: - Summary: DistributedShell's timeline related options are not clear Key: YARN-2772 URL: https://issues.apache.org/jira/browse/YARN-2772 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen The new options "domain" and "create" options - they are not descriptive at all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named
[ https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2771: -- Component/s: applications/distributed-shell > DistributedShell's DSConstants are badly named > -- > > Key: YARN-2771 > URL: https://issues.apache.org/jira/browse/YARN-2771 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen > > I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of > DISTRIBUTEDSHELLTIMELINEDOMAIN). > DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to > be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? > For the old envs, we can just add new envs that point to the old-one and > deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188925#comment-14188925 ] Wangda Tan commented on YARN-2698: -- Hi [~vinodkv], bq. YarnClient usually has simpler APIs (like returning a map) instead of directly exposing the response objects, let’s do that. Addressed bq. bin/yarn needs to be updated to use the new CLI Addressed bq. Overall, I didn’t realize we already have a node CLI already: Let’s just move the node to labels mappings to that CLI. We could keep the all-nodes mapping though. The node CLI is major get labels from NodeReport, they're all running NMs, I suggest to keep node to labels mapping in node-labels CLI (as its name), and in the future we can add a "labels" field in NodeReport and nodeCLI bq. “will return all labels in the cluster” -> “will return all accessible labels in the cluster” I changed it to be ".. return all node labels" to make it consistent with java API names, please let me know if you disagree bq. CLI for "node-labels -list” should drop the prefix “Node-labels=“ Addressed bq. CLI for “node-labels -list -nodeId all”: Say Node instead of Host? And then simply make it “Node:nm:5432 -> label1, label2” Addressed bq. Move the node-cli tests into their own TestNodeLabelsCLI Addressed bq. Validate the help message for the new CLI. Addressed > Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of > RMAdminCLI > --- > > Key: YARN-2698 > URL: https://issues.apache.org/jira/browse/YARN-2698 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, > YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, > YARN-2698-20141029-2.patch > > > YARN RMAdminCLI and AdminService should have write API only, for other read > APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20141030-1.patch Hi [~wangda], I am uploading a patch with all the review comments fixed and with test cases, but i need to rebase it based on the latest code in trunk which i will do it tomorrow morning . With this patch you can review and if fine will submit the patch after re basing tomorrow > Allow admin specify labels from each NM (Distributed configuration) > --- > > Key: YARN-2495 > URL: https://issues.apache.org/jira/browse/YARN-2495 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Naganarasimha G R > Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, > YARN-2495.20141030-1.patch, YARN-2495_20141022.1.patch > > > Target of this JIRA is to allow admin specify labels in each NM, this covers > - User can set labels in each NM (by setting yarn-site.xml or using script > suggested by [~aw]) > - NM will send labels to RM via ResourceTracker API > - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2698: - Attachment: YARN-2698-20141029-2.patch > Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of > RMAdminCLI > --- > > Key: YARN-2698 > URL: https://issues.apache.org/jira/browse/YARN-2698 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, > YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, > YARN-2698-20141029-2.patch > > > YARN RMAdminCLI and AdminService should have write API only, for other read > APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188891#comment-14188891 ] Hadoop QA commented on YARN-2698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677927/YARN-2698-20141029-1.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestMRTimelineEventHandling {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5628//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5628//console This message is automatically generated. > Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of > RMAdminCLI > --- > > Key: YARN-2698 > URL: https://issues.apache.org/jira/browse/YARN-2698 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, > YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch > > > YARN RMAdminCLI and AdminService should have write API only, for other read > APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188830#comment-14188830 ] Hudson commented on YARN-2769: -- FAILURE: Integrated in Hadoop-trunk-Commit #6385 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6385/]) YARN-2769. Fixed the problem that timeline domain is not set in distributed shell AM when using shell_command on Windows. Contributed by Varun Vasudev. (zjshen: rev a8c120222047280234c3411ce1c1c9b17f08c851) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt > Timeline server domain not set correctly when using shell_command on Windows > > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.6.0 > > Attachments: apache-yarn-2769.0.patch > > > The bug is caught by one of the unit tests which fails. > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188796#comment-14188796 ] Zhijie Shen commented on YARN-2769: --- +1. The fix makes sense, and we have the test to cover the code path on windows. Will commit the patch. > Timeline server domain not set correctly when using shell_command on Windows > > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > The bug is caught by one of the unit tests which fails. > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188787#comment-14188787 ] Karthik Kambatla commented on YARN-2738: Do we want to make it configurable per-queue from the beginning? How about just starting with global settings for all queues, and adding per-queue configs depending on usecases and user feedback? Comments on the patch itself: # FairReservationSystem: The TODO is not clear to me. IAC, we should avoid orphan TODOs - can we file a follow-up JIRA and add a reference at the TODO. # Spurious import changes in a couple of files. > Add FairReservationSystem for FairScheduler > --- > > Key: YARN-2738 > URL: https://issues.apache.org/jira/browse/YARN-2738 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2738.001.patch > > > Need to create a FairReservationSystem that will implement ReservationSystem > for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2771) DistributedShell's DSConstants are badly named
Vinod Kumar Vavilapalli created YARN-2771: - Summary: DistributedShell's DSConstants are badly named Key: YARN-2771 URL: https://issues.apache.org/jira/browse/YARN-2771 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of DISTRIBUTEDSHELLTIMELINEDOMAIN). DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? For the old envs, we can just add new envs that point to the old-one and deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188755#comment-14188755 ] Jason Lowe commented on YARN-2765: -- I agree that the timeline server seems like a worthy candidate for rocksdb. IIUC rocksdb's main use-case over leveldb is better performance when the database is larger than the node's RAM, which is likely in the case of the timeline server. bq. And one other merit I've heard about rocksdb is that it can ride on HDFS. This is news to me. I knew rocksdb could be used as a cache of data that came from HDFS or could be backed-up to HDFS, but I didn't think it could read/write directly to it as part of normal operations. bq. There's a rocksdb jni which seems to have windows support: https://github.com/fusesource/rocksdbjni Awesome, thanks for finding that. I was looking at the standard org.rocksdb package. Only concern with the fusesource option would be if it starts to diverge significantly from the standard one. The API is already slightly different between the two, and the fusesource one hasn't been touched in a year while the org.rocksdb package was updated just last week. Probably best to continue this conversation in a separate JIRA proposing we consider rocksdb for the timeline server. If it works well there it should be very straightforward to provide store backends for the RM, NM, and JHS if it makes sense for them as well. > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188739#comment-14188739 ] Karthik Kambatla commented on YARN-2690: Looks mostly good. Can we look into the javadoc warnings? Few minor comments: # Rename ReservationSchedulerConfiguration to ReservationConfiguration? Not sure the Scheduler in there is adding much information. # Make ReservationConfiguration an abstract class that extends Configuration instead of an interface, so it can implement some of the getters at least those for which it carries defaults. # Nit: The time defaults should be product of numbers instead of the result. e.g. {{24 * 60 * 60 * 1000}} instead of 8640L. > Make ReservationSystem and its dependent classes independent of Scheduler > type > > > Key: YARN-2690 > URL: https://issues.apache.org/jira/browse/YARN-2690 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-2690.001.patch, YARN-2690.002.patch, > YARN-2690.002.patch, YARN-2690.003.patch > > > A lot of common reservation classes depend on CapacityScheduler and > specifically its configuration. This jira is to make them ready for other > Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_ dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188728#comment-14188728 ] Siqi Li commented on YARN-2755: --- Hi [~jlowe] can you take a look at this? > NM fails to clean up usercache_DEL_ dirs after YARN-661 > -- > > Key: YARN-2755 > URL: https://issues.apache.org/jira/browse/YARN-2755 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, > YARN-2755.v3.patch > > > When NM restarts frequently due to some reason, a large number of directories > like these left in /data/disk$num/yarn/local/: > /data/disk1/yarn/local/usercache_DEL_1414372756105 > /data/disk1/yarn/local/usercache_DEL_1413557901696 > /data/disk1/yarn/local/usercache_DEL_1413657004894 > /data/disk1/yarn/local/usercache_DEL_1413675321860 > /data/disk1/yarn/local/usercache_DEL_1414093167936 > /data/disk1/yarn/local/usercache_DEL_1413565841271 > These directories are empty, but take up 100M+ due to the number of them. > There were 38714 on the machine I looked at per data disk. > It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188656#comment-14188656 ] Zhijie Shen commented on YARN-2766: --- [~rkanter], thanks for reporting the test failure. I can reproduce the same failure with JDK 8, but think about the problem again: it seems to be useless to return the map collection from ApplicationHistoryManager. And it creates the problem that we simply call .value() to get all report objects, making the order of the report objects are unpredictable on CLI or web services output. IMHO, ApplicationHistoryManager should return a sorted list directly. > [JDK 8] TestApplicationHistoryClientService fails > - > > Key: YARN-2766 > URL: https://issues.apache.org/jira/browse/YARN-2766 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: YARN-2766.patch > > > {{TestApplicationHistoryClientService.testContainers}} and > {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail > because the test assertions are assuming a returned Collection is in a > certain order. The collection comes from a HashMap, so the order is not > guaranteed, plus, according to [this > page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], > there are situations where the iteration order of a HashMap will be > different between Java 7 and 8. > We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188639#comment-14188639 ] Hudson commented on YARN-2742: -- FAILURE: Integrated in Hadoop-trunk-Commit #6382 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6382/]) YARN-2742. FairSchedulerConfiguration should allow extra spaces between value and unit. (Wei Yan via kasha) (kasha: rev 782971ae7a0247bcf5920e10b434b7e0954dd868) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java > FairSchedulerConfiguration should allow extra spaces between value and unit > --- > > Key: YARN-2742 > URL: https://issues.apache.org/jira/browse/YARN-2742 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0 >Reporter: Sangjin Lee >Assignee: Wei Yan >Priority: Minor > Fix For: 2.7.0 > > Attachments: YARN-2742-1.patch, YARN-2742-2.patch > > > FairSchedulerConfiguration is very strict about the number of space > characters between the value and the unit: 0 or 1 space. > For example, for values like the following: > {noformat} > 4096 mb, 2 vcores > {noformat} > (note 2 spaces) > This above line fails to parse: > {noformat} > 2014-10-24 22:56:40,802 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: > Failed to reload fair scheduler config file - will use existing allocations. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: > Missing resource: mb > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2742: --- Summary: FairSchedulerConfiguration should allow extra spaces between value and unit (was: FairSchedulerConfiguration fails to parse if there is extra space between value and unit) > FairSchedulerConfiguration should allow extra spaces between value and unit > --- > > Key: YARN-2742 > URL: https://issues.apache.org/jira/browse/YARN-2742 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0 >Reporter: Sangjin Lee >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2742-1.patch, YARN-2742-2.patch > > > FairSchedulerConfiguration is very strict about the number of space > characters between the value and the unit: 0 or 1 space. > For example, for values like the following: > {noformat} > 4096 mb, 2 vcores > {noformat} > (note 2 spaces) > This above line fails to parse: > {noformat} > 2014-10-24 22:56:40,802 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: > Failed to reload fair scheduler config file - will use existing allocations. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: > Missing resource: mb > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188614#comment-14188614 ] Karthik Kambatla commented on YARN-2742: +1 > FairSchedulerConfiguration fails to parse if there is extra space between > value and unit > > > Key: YARN-2742 > URL: https://issues.apache.org/jira/browse/YARN-2742 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.4.0 >Reporter: Sangjin Lee >Assignee: Wei Yan >Priority: Minor > Attachments: YARN-2742-1.patch, YARN-2742-2.patch > > > FairSchedulerConfiguration is very strict about the number of space > characters between the value and the unit: 0 or 1 space. > For example, for values like the following: > {noformat} > 4096 mb, 2 vcores > {noformat} > (note 2 spaces) > This above line fails to parse: > {noformat} > 2014-10-24 22:56:40,802 ERROR > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: > Failed to reload fair scheduler config file - will use existing allocations. > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: > Missing resource: mb > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188595#comment-14188595 ] Zhijie Shen edited comment on YARN-2765 at 10/29/14 5:22 PM: - bq. This should work if the leveldb database is on a network store like a filer. Thanks for sharing. This is an interesting use case that I'm not aware of before. bq. I briefly considered using rocksdb for this but decided against it for a couple of reasons: It's not particularly related to this Jira, but I just want to think it out loudly. It seems that rocksdb claims to have better performance in terms of I/O than leveldb, while their APIs are very similar to each other. After we have the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the timeline server may want a stronger one. Rocksdb may be a compromise before migrating to fully distributed storage solution based on HBase. And one other merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm wrong, but it seems that rocksdb can also help to scale out the storage problem as well as support RM HA deployment in a shared nothing environment (e.g. without a network storage). I'm not saying we should go with rocksdb now instead of leveldb, as we know it has been used for other components already. I'm trying to propose if we can think of rocksdb, which looks stronger but still reasonably simple alternate. There's a rocksdb jni which seems to have windows support: https://github.com/fusesource/rocksdbjni It should be the same org whose leveldbjni is currently used by us. was (Author: zjshen): bq. This should work if the leveldb database is on a network store like a filer. Thanks for sharing. This is an interesting use case that I'm not aware of before. bq. I briefly considered using rocksdb for this but decided against it for a couple of reasons: It's not particularly related to this Jira, but I just want to think it out loudly. It seems that rocksdb claims to have better performance in terms of I/O than leveldb, while their APIs are very similar to each other. After we have the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the timeline server may want a stronger one. Rocksdb may be a compromise before migrating to fully distributed storage solution based on HBase. And one other merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm wrong, but it seems that rocksdb can also help to scale out the storage problem as well as support RM HA deployment in a shared nothing environment (e.g. without a network storage). I'm not saying we should go with rocksdb now instead of leveldb, as we know it has been used for other components already. I'm trying to propose if we can think of rocksdb, which looks stronger but still reasonably simple alternate. > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM
Zhijie Shen created YARN-2770: - Summary: Timeline delegation tokens need to be automatically renewed by the RM Key: YARN-2770 URL: https://issues.apache.org/jira/browse/YARN-2770 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.5.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical YarnClient will automatically grab a timeline DT for the application and pass it to the app AM. Now the timeline DT renew is still dummy. If an app is running for more than 24h (default DT expiry time), the app AM is no longer able to use the expired DT to communicate with the timeline server. Since RM will cache the credentials of each app, and renew the DTs for the running app. We should provider renew hooks similar to what HDFS DT has for RM, and set RM user as the renewer when grabbing the timeline DT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188595#comment-14188595 ] Zhijie Shen commented on YARN-2765: --- bq. This should work if the leveldb database is on a network store like a filer. Thanks for sharing. This is an interesting use case that I'm not aware of before. bq. I briefly considered using rocksdb for this but decided against it for a couple of reasons: It's not particularly related to this Jira, but I just want to think it out loudly. It seems that rocksdb claims to have better performance in terms of I/O than leveldb, while their APIs are very similar to each other. After we have the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the timeline server may want a stronger one. Rocksdb may be a compromise before migrating to fully distributed storage solution based on HBase. And one other merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm wrong, but it seems that rocksdb can also help to scale out the storage problem as well as support RM HA deployment in a shared nothing environment (e.g. without a network storage). I'm not saying we should go with rocksdb now instead of leveldb, as we know it has been used for other components already. I'm trying to propose if we can think of rocksdb, which looks stronger but still reasonably simple alternate. > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Description: The bug is caught by one of the unit tests which fails. {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} was: {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} > Timeline server domain not set correctly when using shell_command on Windows > > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > The bug is caught by one of the unit tests which fails. > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188589#comment-14188589 ] Hadoop QA commented on YARN-2765: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677911/YARN-2765v2.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5626//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5626//console This message is automatically generated. > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Summary: Timeline server domain not set correctly when using shell_command on Windows (was: TestDistributedShell#testDSShell fails on Windows) > Timeline server domain not set correctly when using shell_command on Windows > > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Test > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Issue Type: Bug (was: Test) > Timeline server domain not set correctly when using shell_command on Windows > > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2698: - Attachment: YARN-2698-20141029-1.patch > Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of > RMAdminCLI > --- > > Key: YARN-2698 > URL: https://issues.apache.org/jira/browse/YARN-2698 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, > YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch > > > YARN RMAdminCLI and AdminService should have write API only, for other read > APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188530#comment-14188530 ] Varun Vasudev commented on YARN-2769: - I haven't included any test since this is a fix for a test failing on Windows. > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Test > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188526#comment-14188526 ] Hadoop QA commented on YARN-2769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677908/apache-yarn-2769.0.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5627//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5627//console This message is automatically generated. > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Test > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188516#comment-14188516 ] Bikas Saha commented on YARN-1902: -- bq. Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Firstly, I am not sure if the same ContainerRequest object can be passed multiple times in addContainerRequest. It should be different objects each time (even if they point to the same resource). This might have something to do with the internal book-keeping done for matching requests. Secondly, after z requests are made and 1 allocation is received then z-1 requests remain. If you are using AMRMClientImpl then its your (users) responsibility to call removeContainerRequest() for the request that was matched to this container. The AMRMClient does not know which of your z requests could be assigned to this container. So in the general case, it cannot automatically remove a request from the internal table because it does not know which request to remove. If the javadocs dont clarify these semantics then we can improve the javadocs. Thirdly, the protocol between the AMRMClient and the RM has an inherent race. So if the client had earlier asked for z containers and in the next heartbeat reduces that to z-1, the RM may actually return z containers to it because it had already allocated them to this client before the client updated the RM with the new value. > Allocation of too many containers when a second request is done with the same > resource capability > - > > Key: YARN-1902 > URL: https://issues.apache.org/jira/browse/YARN-1902 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Sietse T. Au > Labels: client > Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch > > > Regarding AMRMClientImpl > Scenario 1: > Given a ContainerRequest x with Resource y, when addContainerRequest is > called z times with x, allocate is called and at least one of the z allocated > containers is started, then if another addContainerRequest call is done and > subsequently an allocate call to the RM, (z+1) containers will be allocated, > where 1 container is expected. > Scenario 2: > No containers are started between the allocate calls. > Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) > are requested in both scenarios, but that only in the second scenario, the > correct behavior is observed. > Looking at the implementation I have found that this (z+1) request is caused > by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any > information about whether a request has been sent to the RM yet or not. > There are workarounds for this, such as releasing the excess containers > received. > The solution implemented is to initialize a new ResourceRequest in > ResourceRequestInfo when a request has been successfully sent to the RM. > The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2765: - Attachment: YARN-2765v2.patch Thanks for the review, Tsuyoshi! bq. How about adding helper methods like getKeyPrefix/getNodePath for getting key prefix and node path? Sure, added some helper methods to compute leveldb keys for various things. bq. I found that the patch includes lots hard-coded "/". I think it's better to have private field SEPARATOR = "/". IMHO this makes the code less readable, similar to a code style like {{final int ONE = 1}}. But I don't care too strongly about it and changed all occurrences to SEPARATOR. For Zhijie's comments: bq. One drawback I can think of is that while LeveldbRMStateStore is lightweight for single RM restarting, multiple RMs of HA are not able to share this single-host database. This should work if the leveldb database is on a network store like a filer. Leveldb uses locks to prevent multiple processes from trying to access the database simultaneously, so there's a little bit of help for the fencing scenarios. However the fencing script actions would have to do some extra work to force a poorly-behaving resourcemanager to let go of the locks so a standby RM can open the store and become active. bq. Did you have a chance to think of an enhanced k/v db: rocksdb? I briefly considered using rocksdb for this but decided against it for a couple of reasons: * leveldb is already used by the timeline server and nodemanager, and I would rather avoid adding yet another new dependency for this * leveldb supports win32/win64, but it doesn't appear that the standard rocksdbjni distribution has support for Windows. > Add leveldb-based implementation for RMStateStore > - > > Key: YARN-2765 > URL: https://issues.apache.org/jira/browse/YARN-2765 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-2765.patch, YARN-2765v2.patch > > > It would be nice to have a leveldb option to the resourcemanager recovery > store. Leveldb would provide some benefits over the existing filesystem store > such as better support for atomic operations, fewer I/O ops per state update, > and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2711) TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188472#comment-14188472 ] Junping Du commented on YARN-2711: -- Thanks [~vvasudev] for the patch and [~cwelch] for review! Patch looks good to me. Will commit it shortly. > TestDefaultContainerExecutor#testContainerLaunchError fails on Windows > -- > > Key: YARN-2711 > URL: https://issues.apache.org/jira/browse/YARN-2711 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2711.0.patch > > > The testContainerLaunchError test fails on Windows with the following error - > {noformat} > java.io.FileNotFoundException: File file:/bin/echo does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120) > at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117) > at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145) > at > org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Issue Type: Test (was: Bug) > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Test > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Description: {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} was: Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > {noformat} > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188464#comment-14188464 ] Varun Vasudev commented on YARN-2769: - Since we use shell_command in the test, {noformat} if (envs.containsKey(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION)) { {noformat} is false on Windows(but true on Linux). Just moving the domain id setting out of this if-condition fixes the bug. > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188465#comment-14188465 ] Varun Vasudev commented on YARN-2769: - Attached fix. > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Attachment: apache-yarn-2769.0.patch > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2769.0.patch > > > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Description: Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) > TestDistributedShell#testDSShell fails on Windows > - > > Key: YARN-2769 > URL: https://issues.apache.org/jira/browse/YARN-2769 > Project: Hadoop YARN > Issue Type: Bug > Components: applications/distributed-shell >Reporter: Varun Vasudev >Assignee: Varun Vasudev > > Running > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec > <<< FAILURE! - in org.apache.hadoop.yarn.applications.distribut > testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 37.366 sec <<< FAILURE! > org.junit.ComparisonFailure: expected:<[TEST_DOMAIN]> but was:<[DEFAULT]> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
Varun Vasudev created YARN-2769: --- Summary: TestDistributedShell#testDSShell fails on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188375#comment-14188375 ] Hudson commented on YARN-2758: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java > Update TestApplicationHistoryClientService to use the new generic history > store > --- > > Key: YARN-2758 > URL: https://issues.apache.org/jira/browse/YARN-2758 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2758.1.patch > > > TestApplicationHistoryClientService is still testing against the mock data in > the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188384#comment-14188384 ] Hudson commented on YARN-2741: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java > Windows: Node manager cannot serve up log files via the web user interface > when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the > drive that nodemanager is running on) > -- > > Key: YARN-2741 > URL: https://issues.apache.org/jira/browse/YARN-2741 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: Windows >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2741.1.patch, YARN-2741.6.patch > > > PROBLEM: User is getting "No Logs available for Container Container_" > when setting the yarn.nodemanager.log-dirs to any drive letter other than C: > STEPS TO REPRODUCE: > On Windows > 1) Run NodeManager on C: > 2) Create two local drive partitions D: and E: > 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs > 4) Run a MR job that will last at least 5 minutes > 5) While the job is in flight, log into the Yarn web ui , > /cluster > 6) Click on the application_id > 7) Click on the logs link, you will get "No Logs available for Container > Container_" > ACTUAL BEHAVIOR: Getting an error message when viewing the container logs > EXPECTED BEHAVIOR: Able to use different drive letters in > yarn.nodemanager.log-dirs and not get error > NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able > to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188379#comment-14188379 ] Hudson commented on YARN-2747: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is enabled. Contributed by Xuan Gong. (zjshen: rev ec63a3ffbd9413e7434594682fdbbd36eef7413c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt > TestAggregatedLogFormat fails in trunk > -- > > Key: YARN-2747 > URL: https://issues.apache.org/jira/browse/YARN-2747 > Project: Hadoop YARN > Issue Type: Test >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2747.1.patch > > > Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat > Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec <<< > FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat > testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat) > Time elapsed: 0.047 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188374#comment-14188374 ] Hudson commented on YARN-2503: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java Missing CHANGES.txt for YARN-2503. (jianhe: rev 0782f602881272392381486bcc749850f96acd22) * hadoop-yarn-project/CHANGES.txt > Changes in RM Web UI to better show labels to end users > --- > > Key: YARN-2503 > URL: https://issues.apache.org/jira/browse/YARN-2503 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.6.0 > > Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, > YARN-2503.patch > > > Include but not limited to: > - Show labels of nodes in RM/nodes page > - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188386#comment-14188386 ] Hudson commented on YARN-2760: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt > Completely remove word 'experimental' from FairScheduler docs > - > > Key: YARN-2760 > URL: https://issues.apache.org/jira/browse/YARN-2760 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.1.0-beta >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 2.6.0 > > Attachments: YARN-2760.patch, YARN-2760.patch > > > After YARN-1034, FairScheduler has not been 'experimental' in any aspect of > use, but the doc change done in that did not entirely cover removal of that > word, leaving a remnant in the preemption sub-point. This needs to be removed > as well, as the feature has been good to use for a long time now, and is not > experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188343#comment-14188343 ] Hudson commented on YARN-2760: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt > Completely remove word 'experimental' from FairScheduler docs > - > > Key: YARN-2760 > URL: https://issues.apache.org/jira/browse/YARN-2760 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.1.0-beta >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 2.6.0 > > Attachments: YARN-2760.patch, YARN-2760.patch > > > After YARN-1034, FairScheduler has not been 'experimental' in any aspect of > use, but the doc change done in that did not entirely cover removal of that > word, leaving a remnant in the preemption sub-point. This needs to be removed > as well, as the feature has been good to use for a long time now, and is not > experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188332#comment-14188332 ] Hudson commented on YARN-2758: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt > Update TestApplicationHistoryClientService to use the new generic history > store > --- > > Key: YARN-2758 > URL: https://issues.apache.org/jira/browse/YARN-2758 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2758.1.patch > > > TestApplicationHistoryClientService is still testing against the mock data in > the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188331#comment-14188331 ] Hudson commented on YARN-2503: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java Missing CHANGES.txt for YARN-2503. (jianhe: rev 0782f602881272392381486bcc749850f96acd22) * hadoop-yarn-project/CHANGES.txt > Changes in RM Web UI to better show labels to end users > --- > > Key: YARN-2503 > URL: https://issues.apache.org/jira/browse/YARN-2503 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.6.0 > > Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, > YARN-2503.patch > > > Include but not limited to: > - Show labels of nodes in RM/nodes page > - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188341#comment-14188341 ] Hudson commented on YARN-2741: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt > Windows: Node manager cannot serve up log files via the web user interface > when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the > drive that nodemanager is running on) > -- > > Key: YARN-2741 > URL: https://issues.apache.org/jira/browse/YARN-2741 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: Windows >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2741.1.patch, YARN-2741.6.patch > > > PROBLEM: User is getting "No Logs available for Container Container_" > when setting the yarn.nodemanager.log-dirs to any drive letter other than C: > STEPS TO REPRODUCE: > On Windows > 1) Run NodeManager on C: > 2) Create two local drive partitions D: and E: > 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs > 4) Run a MR job that will last at least 5 minutes > 5) While the job is in flight, log into the Yarn web ui , > /cluster > 6) Click on the application_id > 7) Click on the logs link, you will get "No Logs available for Container > Container_" > ACTUAL BEHAVIOR: Getting an error message when viewing the container logs > EXPECTED BEHAVIOR: Able to use different drive letters in > yarn.nodemanager.log-dirs and not get error > NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able > to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188336#comment-14188336 ] Hudson commented on YARN-2747: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is enabled. Contributed by Xuan Gong. (zjshen: rev ec63a3ffbd9413e7434594682fdbbd36eef7413c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java > TestAggregatedLogFormat fails in trunk > -- > > Key: YARN-2747 > URL: https://issues.apache.org/jira/browse/YARN-2747 > Project: Hadoop YARN > Issue Type: Test >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2747.1.patch > > > Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat > Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec <<< > FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat > testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat) > Time elapsed: 0.047 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188265#comment-14188265 ] Hudson commented on YARN-2503: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java Missing CHANGES.txt for YARN-2503. (jianhe: rev 0782f602881272392381486bcc749850f96acd22) * hadoop-yarn-project/CHANGES.txt > Changes in RM Web UI to better show labels to end users > --- > > Key: YARN-2503 > URL: https://issues.apache.org/jira/browse/YARN-2503 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.6.0 > > Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, > YARN-2503.patch > > > Include but not limited to: > - Show labels of nodes in RM/nodes page > - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188275#comment-14188275 ] Hudson commented on YARN-2741: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java > Windows: Node manager cannot serve up log files via the web user interface > when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the > drive that nodemanager is running on) > -- > > Key: YARN-2741 > URL: https://issues.apache.org/jira/browse/YARN-2741 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: Windows >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2741.1.patch, YARN-2741.6.patch > > > PROBLEM: User is getting "No Logs available for Container Container_" > when setting the yarn.nodemanager.log-dirs to any drive letter other than C: > STEPS TO REPRODUCE: > On Windows > 1) Run NodeManager on C: > 2) Create two local drive partitions D: and E: > 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs > 4) Run a MR job that will last at least 5 minutes > 5) While the job is in flight, log into the Yarn web ui , > /cluster > 6) Click on the application_id > 7) Click on the logs link, you will get "No Logs available for Container > Container_" > ACTUAL BEHAVIOR: Getting an error message when viewing the container logs > EXPECTED BEHAVIOR: Able to use different drive letters in > yarn.nodemanager.log-dirs and not get error > NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able > to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188277#comment-14188277 ] Hudson commented on YARN-2760: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt > Completely remove word 'experimental' from FairScheduler docs > - > > Key: YARN-2760 > URL: https://issues.apache.org/jira/browse/YARN-2760 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation >Affects Versions: 2.1.0-beta >Reporter: Harsh J >Assignee: Harsh J >Priority: Trivial > Fix For: 2.6.0 > > Attachments: YARN-2760.patch, YARN-2760.patch > > > After YARN-1034, FairScheduler has not been 'experimental' in any aspect of > use, but the doc change done in that did not entirely cover removal of that > word, leaving a remnant in the preemption sub-point. This needs to be removed > as well, as the feature has been good to use for a long time now, and is not > experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188266#comment-14188266 ] Hudson commented on YARN-2758: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java > Update TestApplicationHistoryClientService to use the new generic history > store > --- > > Key: YARN-2758 > URL: https://issues.apache.org/jira/browse/YARN-2758 > Project: Hadoop YARN > Issue Type: Test > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2758.1.patch > > > TestApplicationHistoryClientService is still testing against the mock data in > the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188270#comment-14188270 ] Hudson commented on YARN-2747: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is enabled. Contributed by Xuan Gong. (zjshen: rev ec63a3ffbd9413e7434594682fdbbd36eef7413c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt > TestAggregatedLogFormat fails in trunk > -- > > Key: YARN-2747 > URL: https://issues.apache.org/jira/browse/YARN-2747 > Project: Hadoop YARN > Issue Type: Test >Reporter: Xuan Gong >Assignee: Xuan Gong > Fix For: 2.6.0 > > Attachments: YARN-2747.1.patch > > > Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat > Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec <<< > FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat > testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat) > Time elapsed: 0.047 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188222#comment-14188222 ] Hadoop QA commented on YARN-2768: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677855/YARN-2768.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5625//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5625//console This message is automatically generated. > optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% > of computing time of update thread > > > Key: YARN-2768 > URL: https://issues.apache.org/jira/browse/YARN-2768 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: YARN-2768.patch, profiling_FairScheduler_update.png > > > See the attached picture of profiling result. The clone of Resource object > within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the > function FairScheduler.update(). > The code of FSAppAttempt.updateDemand: > {code} > public void updateDemand() { > demand = Resources.createResource(0); > // Demand is current consumption plus outstanding requests > Resources.addTo(demand, app.getCurrentConsumption()); > // Add up outstanding resource requests > synchronized (app) { > for (Priority p : app.getPriorities()) { > for (ResourceRequest r : app.getResourceRequests(p).values()) { > Resource total = Resources.multiply(r.getCapability(), > r.getNumContainers()); > Resources.addTo(demand, total); > } > } > } > } > {code} > The code of Resources.multiply: > {code} > public static Resource multiply(Resource lhs, double by) { > return multiplyTo(clone(lhs), by); > } > {code} > The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188190#comment-14188190 ] Yogesh Sobale commented on YARN-1902: - Can someone please update ? > Allocation of too many containers when a second request is done with the same > resource capability > - > > Key: YARN-1902 > URL: https://issues.apache.org/jira/browse/YARN-1902 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Sietse T. Au > Labels: client > Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch > > > Regarding AMRMClientImpl > Scenario 1: > Given a ContainerRequest x with Resource y, when addContainerRequest is > called z times with x, allocate is called and at least one of the z allocated > containers is started, then if another addContainerRequest call is done and > subsequently an allocate call to the RM, (z+1) containers will be allocated, > where 1 container is expected. > Scenario 2: > No containers are started between the allocate calls. > Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) > are requested in both scenarios, but that only in the second scenario, the > correct behavior is observed. > Looking at the implementation I have found that this (z+1) request is caused > by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any > information about whether a request has been sent to the RM yet or not. > There are workarounds for this, such as releasing the excess containers > received. > The solution implemented is to initialize a new ResourceRequest in > ResourceRequestInfo when a request has been successfully sent to the RM. > The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188188#comment-14188188 ] Yogesh Sobale commented on YARN-1902: - Can some please update ? > Allocation of too many containers when a second request is done with the same > resource capability > - > > Key: YARN-1902 > URL: https://issues.apache.org/jira/browse/YARN-1902 > Project: Hadoop YARN > Issue Type: Bug > Components: client >Affects Versions: 2.2.0, 2.3.0, 2.4.0 >Reporter: Sietse T. Au > Labels: client > Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch > > > Regarding AMRMClientImpl > Scenario 1: > Given a ContainerRequest x with Resource y, when addContainerRequest is > called z times with x, allocate is called and at least one of the z allocated > containers is started, then if another addContainerRequest call is done and > subsequently an allocate call to the RM, (z+1) containers will be allocated, > where 1 container is expected. > Scenario 2: > No containers are started between the allocate calls. > Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) > are requested in both scenarios, but that only in the second scenario, the > correct behavior is observed. > Looking at the implementation I have found that this (z+1) request is caused > by the structure of the remoteRequestsTable. The consequence of Map ResourceRequestInfo> is that ResourceRequestInfo does not hold any > information about whether a request has been sent to the RM yet or not. > There are workarounds for this, such as releasing the excess containers > received. > The solution implemented is to initialize a new ResourceRequest in > ResourceRequestInfo when a request has been successfully sent to the RM. > The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode
[ https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188182#comment-14188182 ] Hadoop QA commented on YARN-2767: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677848/apache-yarn-2767.1.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5623//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5623//console This message is automatically generated. > RM web services - add test case to ensure the http static user can kill or > submit apps in secure mode > - > > Key: YARN-2767 > URL: https://issues.apache.org/jira/browse/YARN-2767 > Project: Hadoop YARN > Issue Type: Test > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch > > > We should add a test to ensure that the http static user used to access the > RM web interface can't submit or kill apps if the cluster is running in > secure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: YARN-2768.patch Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo. After this optimization, the average time costed by FairScheduler.update (a TestCase with 10k apps) is reduced 40%. I'm not sure whether it's better to have such test cases also submitted. > optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% > of computing time of update thread > > > Key: YARN-2768 > URL: https://issues.apache.org/jira/browse/YARN-2768 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: YARN-2768.patch, profiling_FairScheduler_update.png > > > See the attached picture of profiling result. The clone of Resource object > within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the > function FairScheduler.update(). > The code of FSAppAttempt.updateDemand: > {code} > public void updateDemand() { > demand = Resources.createResource(0); > // Demand is current consumption plus outstanding requests > Resources.addTo(demand, app.getCurrentConsumption()); > // Add up outstanding resource requests > synchronized (app) { > for (Priority p : app.getPriorities()) { > for (ResourceRequest r : app.getResourceRequests(p).values()) { > Resource total = Resources.multiply(r.getCapability(), > r.getNumContainers()); > Resources.addTo(demand, total); > } > } > } > } > {code} > The code of Resources.multiply: > {code} > public static Resource multiply(Resource lhs, double by) { > return multiplyTo(clone(lhs), by); > } > {code} > The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188146#comment-14188146 ] Hadoop QA commented on YARN-2768: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677853/profiling_FairScheduler_update.png against trunk revision ec63a3f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5624//console This message is automatically generated. > optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% > of computing time of update thread > > > Key: YARN-2768 > URL: https://issues.apache.org/jira/browse/YARN-2768 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: profiling_FairScheduler_update.png > > > See the attached picture of profiling result. The clone of Resource object > within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the > function FairScheduler.update(). > The code of FSAppAttempt.updateDemand: > {code} > public void updateDemand() { > demand = Resources.createResource(0); > // Demand is current consumption plus outstanding requests > Resources.addTo(demand, app.getCurrentConsumption()); > // Add up outstanding resource requests > synchronized (app) { > for (Priority p : app.getPriorities()) { > for (ResourceRequest r : app.getResourceRequests(p).values()) { > Resource total = Resources.multiply(r.getCapability(), > r.getNumContainers()); > Resources.addTo(demand, total); > } > } > } > } > {code} > The code of Resources.multiply: > {code} > public static Resource multiply(Resource lhs, double by) { > return multiplyTo(clone(lhs), by); > } > {code} > The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Description: See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. was: See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.**multiply**(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(**clone**(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. > optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% > of computing time of update thread > > > Key: YARN-2768 > URL: https://issues.apache.org/jira/browse/YARN-2768 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: profiling_FairScheduler_update.png > > > See the attached picture of profiling result. The clone of Resource object > within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the > function FairScheduler.update(). > The code of FSAppAttempt.updateDemand: > {code} > public void updateDemand() { > demand = Resources.createResource(0); > // Demand is current consumption plus outstanding requests > Resources.addTo(demand, app.getCurrentConsumption()); > // Add up outstanding resource requests > synchronized (app) { > for (Priority p : app.getPriorities()) { > for (ResourceRequest r : app.getResourceRequests(p).values()) { > Resource total = Resources.multiply(r.getCapability(), > r.getNumContainers()); > Resources.addTo(demand, total); > } > } > } > } > {code} > The code of Resources.multiply: > {code} > public static Resource multiply(Resource lhs, double by) { > return multiplyTo(clone(lhs), by); > } > {code} > The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: profiling_FairScheduler_update.png > optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% > of computing time of update thread > > > Key: YARN-2768 > URL: https://issues.apache.org/jira/browse/YARN-2768 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: profiling_FairScheduler_update.png > > > See the attached picture of profiling result. The clone of Resource object > within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the > function FairScheduler.update(). > The code of FSAppAttempt.updateDemand: > {code} > public void updateDemand() { > demand = Resources.createResource(0); > // Demand is current consumption plus outstanding requests > Resources.addTo(demand, app.getCurrentConsumption()); > // Add up outstanding resource requests > synchronized (app) { > for (Priority p : app.getPriorities()) { > for (ResourceRequest r : app.getResourceRequests(p).values()) { > Resource total = Resources.multiply(r.getCapability(), > r.getNumContainers()); > Resources.addTo(demand, total); > } > } > } > } > {code} > The code of Resources.multiply: > {code} > public static Resource multiply(Resource lhs, double by) { > return multiplyTo(clone(lhs), by); > } > {code} > The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
Hong Zhiguo created YARN-2768: - Summary: optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.**multiply**(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(**clone**(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)