[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-1964: -- Attachment: YARN-1964.patch Patch with all classpath changes inside DCE. > Create Docker analog of the LinuxContainerExecutor in YARN > -- > > Key: YARN-1964 > URL: https://issues.apache.org/jira/browse/YARN-1964 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.2.0 >Reporter: Arun C Murthy >Assignee: Abin Shahab > Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, > YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, > yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, > yarn-1964-docker.patch > > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to *package* their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2635: --- Attachment: yarn-2635-4.patch Thanks for the review, Sandy. I tried to parametrize based on conf through a static block in the base class, but couldn't get it to work. The updated patch addresses remaining of your comments. > TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, > yarn-2635-4.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2635: --- Summary: TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS (was: TestRMRestart should run with all schedulers) > TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2476) Apps are scheduled in random order after RM failover
[ https://issues.apache.org/jira/browse/YARN-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157671#comment-14157671 ] Tsuyoshi OZAWA commented on YARN-2476: -- Closing this issue as a duplicated issue of (is part of) of YARN-556. Please feel free to reopen this issue if you have any comments. Thanks! > Apps are scheduled in random order after RM failover > > > Key: YARN-2476 > URL: https://issues.apache.org/jira/browse/YARN-2476 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 > Environment: Linux >Reporter: Santosh Marella > Labels: ha, high-availability, resourcemanager > > RM HA is configured with 2 RMs. Used FileSystemRMStateStore. > Fairscheduler allocation file is configured in yarn-site.xml: > > yarn.scheduler.fair.allocation.file > /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml > > FS allocation-pools.xml: > > > > 1 mb,10vcores > 19000 mb,100vcores > 5525 > 4.5 > fair > 3600 > > > 1 mb,10vcores > 19000 mb,100vcores > 5525 > 1.5 > fair > 3600 > > 600 > 600 > > Submitted 10 sleep jobs to a FS queue using the command: > hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep > -Dmapreduce.job.queuename=root.dev -m 10 -r 10 -mt 1 -rt 1 > All the jobs were submitted by the same user, with the same priority and > to the > same queue. No other jobs were running in the cluster. Jobs started > executing > in the order in which they were submitted (jobs 6 to 10 were active, > while 11 > to 15 were waiting): > root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application > -list > Total number of applications (application-types: [] and states: > [SUBMITTED,ACCEPTED, RUNNING]):10 > Application-Id Application-NameApplication-Type User > Queue State Final-State Progress > Tracking-URL > application_1408572781346_0012 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0014 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0011 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0010 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode132:52799 > application_1408572781346_0008 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode131:33766 > application_1408572781346_0009 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode132:50964 > application_1408572781346_0007 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode134:52966 > application_1408572781346_0015 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0006 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 9.5% http://perfnode134:34094 > application_1408572781346_0013 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > Stopped RM1. There was a failover and RM2 became active. But the jobs > seem to > have started in a different order: > root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list > 14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing > over to rm2 > Total number of applications (application-types: [] and states: > [SUBMITTED,ACCEPTED, RUNNING]):10 > Application-Id Application-NameApplication-Type User > Queue State Final-State Progress > Tracking-URL > application_1408572781346_0012 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5%http://perfnode134:59351 > application_1408572781346_0014 Sleep job > MAPREDUCE userAroot.dev R
[jira] [Resolved] (YARN-2476) Apps are scheduled in random order after RM failover
[ https://issues.apache.org/jira/browse/YARN-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA resolved YARN-2476. -- Resolution: Duplicate > Apps are scheduled in random order after RM failover > > > Key: YARN-2476 > URL: https://issues.apache.org/jira/browse/YARN-2476 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 > Environment: Linux >Reporter: Santosh Marella > Labels: ha, high-availability, resourcemanager > > RM HA is configured with 2 RMs. Used FileSystemRMStateStore. > Fairscheduler allocation file is configured in yarn-site.xml: > > yarn.scheduler.fair.allocation.file > /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml > > FS allocation-pools.xml: > > > > 1 mb,10vcores > 19000 mb,100vcores > 5525 > 4.5 > fair > 3600 > > > 1 mb,10vcores > 19000 mb,100vcores > 5525 > 1.5 > fair > 3600 > > 600 > 600 > > Submitted 10 sleep jobs to a FS queue using the command: > hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep > -Dmapreduce.job.queuename=root.dev -m 10 -r 10 -mt 1 -rt 1 > All the jobs were submitted by the same user, with the same priority and > to the > same queue. No other jobs were running in the cluster. Jobs started > executing > in the order in which they were submitted (jobs 6 to 10 were active, > while 11 > to 15 were waiting): > root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application > -list > Total number of applications (application-types: [] and states: > [SUBMITTED,ACCEPTED, RUNNING]):10 > Application-Id Application-NameApplication-Type User > Queue State Final-State Progress > Tracking-URL > application_1408572781346_0012 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0014 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0011 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0010 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode132:52799 > application_1408572781346_0008 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode131:33766 > application_1408572781346_0009 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode132:50964 > application_1408572781346_0007 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5% http://perfnode134:52966 > application_1408572781346_0015 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > application_1408572781346_0006 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 9.5% http://perfnode134:34094 > application_1408572781346_0013 Sleep job > MAPREDUCE userAroot.devACCEPTED > UNDEFINED 0% N/A > Stopped RM1. There was a failover and RM2 became active. But the jobs > seem to > have started in a different order: > root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list > 14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing > over to rm2 > Total number of applications (application-types: [] and states: > [SUBMITTED,ACCEPTED, RUNNING]):10 > Application-Id Application-NameApplication-Type User > Queue State Final-State Progress > Tracking-URL > application_1408572781346_0012 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5%http://perfnode134:59351 > application_1408572781346_0014 Sleep job > MAPREDUCE userAroot.dev RUNNING > UNDEFINED 5%http://perfnode132:37866 > application_1408572781346_0011 Sleep job > MAPREDUCE userAroot.dev
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157667#comment-14157667 ] Hadoop QA commented on YARN-2615: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672724/YARN-2615-v3.patch against trunk revision 2d8e6e2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5249//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5249//console This message is automatically generated. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2615: - Attachment: YARN-2615-v3.patch Thanks [~ozawa] for tracking the Jenkins issue and [~jianhe] for review. In v3 patch, remove unnecessary code mentioned by Jian, but leave some override methods in *ForTest as it need to access the subclass's proto. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157624#comment-14157624 ] Sandy Ryza commented on YARN-2635: -- This seems like a good idea. A few stylistic comments. Can we rename RMSchedulerParametrizedTestBase to ParameterizedSchedulerTestBase? The former confuses me a little because it like something that happened, rather than a noun, and "RM" doesn't seem necessary. Also, Parameterized as spelled in the JUnit class name has three e's. Lastly, can the class include some header comments on what it's doing? {code} + protected void configScheduler(YarnConfiguration conf) throws IOException { +// Configure scheduler {code} Just name the method configureScheduler instead of an abbreviation then comment. {code} + private void configFifoScheduler(YarnConfiguration conf) { +conf.set(YarnConfiguration.RM_SCHEDULER, FifoScheduler.class.getName()); + } + + private void configCapacityScheduler(YarnConfiguration conf) { +conf.set(YarnConfiguration.RM_SCHEDULER, CapacityScheduler.class.getName()); + } {code} These are only one line - can we just inline them? {code} + protected YarnConfiguration conf = null; {code} I think better to make this private and expose it through a getConfig method. Running the tests without FIFO seems reasonable to me. One last thought - not sure how feasible this is, but the code might be simpler if we get rid of SchedulerType and just have the parameters be Configuration objects? > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157619#comment-14157619 ] Hadoop QA commented on YARN-2468: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672717/YARN-2468.11.patch against trunk revision 054f285. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5248//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5248//console This message is automatically generated. > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, > YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, > YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, > YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, > YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, > YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, > YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.11.patch > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, > YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, > YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, > YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, > YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, > YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, > YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157588#comment-14157588 ] Xuan Gong commented on YARN-2468: - bq. Also ContainerLogAggregator.uploadedFileMeta is also not needed to be a class member. I think ContainerLogAggregator.uploadedFileMeta is needed to be a class member. It is used to keep track of all previous uploaded log files for each container. We will use this information to decide whether this log can be aggregated. New patch addressed other comments > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, > YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, > YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, > YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, > YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, > YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, > YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157579#comment-14157579 ] Hadoop QA commented on YARN-2635: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672709/yarn-2635-3.patch against trunk revision 054f285. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5247//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5247//console This message is automatically generated. > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2612) Some completed containers are not reported to NM
[ https://issues.apache.org/jira/browse/YARN-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jun Gong resolved YARN-2612. Resolution: Duplicate > Some completed containers are not reported to NM > > > Key: YARN-2612 > URL: https://issues.apache.org/jira/browse/YARN-2612 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong > Fix For: 2.6.0 > > > We are testing RM work preserving restart and found the following logs when > we ran a simple MapReduce task "PI". Some completed containers which already > pulled by AM never reported back to NM, so NM continuously report the > completed containers while AM had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In YARN-1372, NM will report completed containers to RM until it gets ACK > from RM. If AM does not call allocate, which means that AM does not ack RM, > RM will not ack NM. We([~chenchun]) have observed these two cases when > running Mapreduce task 'pi': > 1) RM sends completed containers to AM. After receiving it, AM thinks it has > done the work and does not need resource, so it does not call allocate. > 2) When AM finishes, it could not ack to RM because AM itself has not > finished yet. > We think when RMAppAttempt call BaseFinalTransition, it means AppAttempt > finishes, then RM could send this AppAttempt's completed containers to NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed
[ https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157570#comment-14157570 ] Jun Gong commented on YARN-2640: [~ozawa], thank you for telling me. Close it now. > TestDirectoryCollection.testCreateDirectories failed > > > Key: YARN-2640 > URL: https://issues.apache.org/jira/browse/YARN-2640 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-2640.2.patch, YARN-2640.patch > > > When running test "mvn test -Dtest=TestDirectoryCollection", it failed: > {code} > Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection) > Time elapsed: 0.969 sec <<< FAILURE! > java.lang.AssertionError: local dir parent not created with proper > permissions expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104) > {code} > I found it was because testDiskSpaceUtilizationLimit ran before > testCreateDirectories when running test, then directory "dirA" was created in > test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried > to create "dirA" with specified permission, it found "dirA" has already been > there and it did nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2641) improve node decommission latency in RM.
zhihai xu created YARN-2641: --- Summary: improve node decommission latency in RM. Key: YARN-2641 URL: https://issues.apache.org/jira/browse/YARN-2641 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu improve node decommission latency in RM. Currently the node decommission only happened after RM received nodeHeartbeat from the Node Manager. The node heartbeat interval is configurable. The default value is 1 second. It will be better to do the decommission during RM Refresh(NodesListManager) instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157524#comment-14157524 ] Craig Welch commented on YARN-1198: --- FYI, it's not possible to call the getAndCalculateHeadroom because nothing can synchronize on the queue during the allocation call without deadlocking - this is why it's necessary to break out the headroom they way it is here and store some items (such as the LeafQueue.User, which comes from the usermanager and syncs on the queu) to avoid any synchronization on the queue itself during the final headroom calculation in the allocate/getHeadroom step. It's not a bad thing to do anyway, to reduce the number of operations (somewhat) in that final headroom calculation - but it is also why we can't just call the getAndCalculateHeadroom as such (unchanged) in allocate() > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, > YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, > YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157518#comment-14157518 ] Tsuyoshi OZAWA commented on YARN-2562: -- [~jianhe], could you check latest patch? > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157515#comment-14157515 ] Karthik Kambatla commented on YARN-2635: By the way, these tests take a long time to run. Do we want to run against all three schedulers? Or, would it be enough to run against CS and FS? > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2635: --- Attachment: yarn-2635-3.patch I was reviewing Wei's patch. While trying out my would-be-suggestions, I ended up making more than I wanted. Here is the patch that: # moves the schedulerSetup Before method to parent class # adds a method to keep track of RMs created in TestRMRestart, so they can stopped after the test is done. Without this, some of the tests were failing depending on order of execution. > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157490#comment-14157490 ] Hadoop QA commented on YARN-2562: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672691/YARN-2562.5-4.patch against trunk revision 054f285. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5246//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5246//console This message is automatically generated. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157461#comment-14157461 ] Santosh Marella commented on YARN-556: -- Referencing YARN-2476 here to ensure the specific scenario mentioned there is fixed as part of this JIRA. > RM Restart phase 2 - Work preserving restart > > > Key: YARN-556 > URL: https://issues.apache.org/jira/browse/YARN-556 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: Work Preserving RM Restart.pdf, > WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch > > > YARN-128 covered storing the state needed for the RM to recover critical > information. This umbrella jira will track changes needed to recover the > running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2562: - Attachment: YARN-2562.5-4.patch > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157431#comment-14157431 ] Benoy Antony commented on YARN-2527: Thanks a lot, [~zjshen]. > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Fix For: 2.6.0 > > Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2598) GHS should show N/A instead of null for the inaccessible information
[ https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157349#comment-14157349 ] Hadoop QA commented on YARN-2598: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672667/YARN-2598.2.patch against trunk revision 054f285. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5245//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5245//console This message is automatically generated. > GHS should show N/A instead of null for the inaccessible information > > > Key: YARN-2598 > URL: https://issues.apache.org/jira/browse/YARN-2598 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2598.1.patch, YARN-2598.2.patch > > > When the user doesn't have the access to an application, the app attempt > information is not visible to the user. ClientRMService will output N/A, but > GHS is showing null, which is not user-friendly. > {code} > 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: > http://nn.example.com:8188/ws/v1/timeline/ > 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at > nn.example.com/240.0.0.11:8050 > 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History > server at nn.example.com/240.0.0.11:10200 > Application Report : > Application-Id : application_1411586934799_0001 > Application-Name : Sleep job > Application-Type : MAPREDUCE > User : hrt_qa > Queue : default > Start-Time : 1411586956012 > Finish-Time : 1411586989169 > Progress : 100% > State : FINISHED > Final-State : SUCCEEDED > Tracking-URL : null > RPC Port : -1 > AM Host : null > Aggregate Resource Allocation : N/A > Diagnostics : null > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157336#comment-14157336 ] Sandy Ryza commented on YARN-1414: -- Awesome > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157311#comment-14157311 ] Hudson commented on YARN-2628: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6183 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6183/]) YARN-2628. Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free. Contributed by Varun Vasudev (jianhe: rev 054f28552687e9b9859c0126e16a2066e20ead3f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/CHANGES.txt > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.6.0 > > Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157302#comment-14157302 ] Steve Loughran commented on YARN-913: - Failing test is still the (believed unrelated) Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 379.565 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testDSRestartWithPreviousRunningContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 38.715 sec <<< FAILURE! java.lang.AssertionError: client failed at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSRestartWithPreviousRunningContainers(TestDistributedShell.java:319) > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2598) GHS should show N/A instead of null for the inaccessible information
[ https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2598: -- Attachment: YARN-2598.2.patch Rebase the patch against the latest trunk > GHS should show N/A instead of null for the inaccessible information > > > Key: YARN-2598 > URL: https://issues.apache.org/jira/browse/YARN-2598 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.6.0 >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2598.1.patch, YARN-2598.2.patch > > > When the user doesn't have the access to an application, the app attempt > information is not visible to the user. ClientRMService will output N/A, but > GHS is showing null, which is not user-friendly. > {code} > 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: > http://nn.example.com:8188/ws/v1/timeline/ > 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at > nn.example.com/240.0.0.11:8050 > 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History > server at nn.example.com/240.0.0.11:10200 > Application Report : > Application-Id : application_1411586934799_0001 > Application-Name : Sleep job > Application-Type : MAPREDUCE > User : hrt_qa > Queue : default > Start-Time : 1411586956012 > Finish-Time : 1411586989169 > Progress : 100% > State : FINISHED > Final-State : SUCCEEDED > Tracking-URL : null > RPC Port : -1 > AM Host : null > Aggregate Resource Allocation : N/A > Diagnostics : null > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157296#comment-14157296 ] Hadoop QA commented on YARN-2635: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672637/YARN-2635-2.patch against trunk revision 6ac1051. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5242//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5242//console This message is automatically generated. > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157291#comment-14157291 ] Hadoop QA commented on YARN-2468: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch against trunk revision f679ca3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5244//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5244//console This message is automatically generated. > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, > YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, > YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, > YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, > YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, > YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157286#comment-14157286 ] Craig Welch commented on YARN-1680: --- This does bring up what I think could be an issue, I'm not sure if it was what you were getting at before or not, [~john.jian.fang], but we could well be introducing a new bug here unless we are careful. I don't see any connection between the scheduler level resource adjustments and the application level adjustments, so if an application had problems with a node and blacklisted it, and then the cluster did, the resource value of the node would be effectively removed from the headroom 2x (once when the application adds it to it's new "blacklist reduction", and a second time when the cluster removes it's value from the clusterResource). I think this could be a problem, I think it could be addressed, but it's something to think about and I don't think the current approach addresses this- [~airbots], [~jlowe], thoughts? > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157275#comment-14157275 ] Craig Welch commented on YARN-1680: --- Blacklisting a node could happen because, for whatever reason, it's not able to run some application's code (missing libraries or whatnot) but the node may be viable for other applications, hence (I assume) the existence of application level blacklisting. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157274#comment-14157274 ] Ray Chiang commented on YARN-2635: -- Oops, pending Jenkins of course. > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157271#comment-14157271 ] Craig Welch commented on YARN-1680: --- [~john.jian.fang] I should probably not have referred to the cluster level adjustments as "blacklisting". What I see is a mechanism (state machine, events, including adding and removing nodes and the "unhealthy" state/the health monitor) that, I think, ultimately result in the CapacityScheduler.addNode() and removeNode() calls, which modify the clusterResource value. In any case, the blacklisting functionality we are addressing here definitely looks to be application specific needs to be addressed at that level. The issue isn't, so far as I know, related to any blacklisting/node health issues outside the one in play here, as those should work properly for headroom as they adjust the cluster resource. The problem is that the application blacklist activity does not adjust the cluster resource and was previously not involved in the headroom calculation. If it's not the case that cluster level adjustments are being made for nodes then this blacklisting will result in duplication among applications as they independently discover problems with nodes and blacklist them, but that is not a new characteristic of the way the system works. > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157269#comment-14157269 ] Hudson commented on YARN-2527: -- FAILURE: Integrated in Hadoop-trunk-Commit #6182 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6182/]) YARN-2527. Fixed the potential NPE in ApplicationACLsManager and added test cases for it. Contributed by Benoy Antony. (zjshen: rev 1c93025a1b370db46e345161dbc15e03f829823f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/server/security/TestApplicationACLsManager.java > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Fix For: 2.6.0 > > Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157265#comment-14157265 ] Karthik Kambatla commented on YARN-1879: Thanks for working on this, Tsuyoshi. Review comments on the latest patch: # Are there cases when we don't want RetryCache enabled? IMO, we should always use the RetryCache (no harm). If we decide on having a config, the default should be true. # I would set DEFAULT_RM_RETRY_CACHE_EXPIRY_MS to {{10 * 60 * 1000}} instead of 60, and the corresponding comment (10 mins) can be removed or moved to the same line. # TestApplicationMasterServiceRetryCache has a few lines longer than 80 chars. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, > YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157248#comment-14157248 ] Craig Welch commented on YARN-1680: --- [~airbots] thanks for your updated WIP patch - I've not looked at it extensively yet, but at first glance it looks good to me. On the original patch I noticed that there seems to be a facility for blacklisting racks as well as nodes, and I was concerned that that needed to be addressed as well. It may be in this patch, but it did not look like it to me. I do think it can be without too much difficulty - I think putting the additions (and removals) into sets and then checking to see if the node's rack is in the set during the node iteration would do the trick (I may be off here, but that looks like it would work to me.) > availableResources sent to applicationMaster in heartbeat should exclude > blacklistedNodes free memory. > -- > > Key: YARN-1680 > URL: https://issues.apache.org/jira/browse/YARN-1680 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.2.0, 2.3.0 > Environment: SuSE 11 SP2 + Hadoop-2.3 >Reporter: Rohith >Assignee: Chen He > Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, > YARN-1680-v2.patch, YARN-1680.patch > > > There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster > slow start is set to 1. > Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is > become unstable(3 Map got killed), MRAppMaster blacklisted unstable > NodeManager(NM-4). All reducer task are running in cluster now. > MRAppMaster does not preempt the reducers because for Reducer preemption > calculation, headRoom is considering blacklisted nodes memory. This makes > jobs to hang forever(ResourceManager does not assing any new containers on > blacklisted nodes but returns availableResouce considers cluster free > memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157246#comment-14157246 ] Ray Chiang commented on YARN-2635: -- Tested TestRM/TestRMRestart/TestClientToAMTokens. All three tests now pass cleanly using FairScheduler. +1 > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157245#comment-14157245 ] Zhijie Shen commented on YARN-2527: --- +1, will commit the patch > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created
[ https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157234#comment-14157234 ] Jason Lowe commented on YARN-2414: -- Ran into this as well. Any update, [~leftnoteasy]? > RM web UI: app page will crash if app is failed before any attempt has been > created > --- > > Key: YARN-2414 > URL: https://issues.apache.org/jira/browse/YARN-2414 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Zhijie Shen >Assignee: Wangda Tan > > {code} > 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error > handling URI: /cluster/app/application_1407887030038_0001 > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) > at > com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) > at > com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84) > at > com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) > at > com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) > at > com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) > at > com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) > at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.ja
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157217#comment-14157217 ] Hadoop QA commented on YARN-1198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672649/YARN-1198.11-with-1857.patch against trunk revision f679ca3. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5243//console This message is automatically generated. > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, > YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, > YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1198: -- Attachment: YARN-1198.11-with-1857.patch Patch combining the last .11 with the latest 1857 patch, to make it easy to check them out together. Tests changed/added for both issues are present and pass (unchanged) > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, > YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, > YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157199#comment-14157199 ] Hadoop QA commented on YARN-2468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch against trunk revision a56f3ec. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5241//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5241//console This message is automatically generated. > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, > YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, > YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, > YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, > YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, > YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2635: -- Attachment: YARN-2635-2.patch Update a patch which implements a bast class, which can be reused in future. > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch, YARN-2635-2.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157133#comment-14157133 ] Xuan Gong commented on YARN-2468: - new patch addressed all the comments > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, > YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, > YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, > YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, > YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, > YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.10.patch > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.10.patch, > YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, > YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, > YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, > YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, > YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, > YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: (was: YARN-2408.4.patch) > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: (was: YARN-2408-5.patch) > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: YARN-2408-5.patch > Resource Request REST API for YARN > -- > > Key: YARN-2408 > URL: https://issues.apache.org/jira/browse/YARN-2408 > Project: Hadoop YARN > Issue Type: New Feature > Components: webapp >Reporter: Renan DelValle > Labels: features > Attachments: YARN-2408-5.patch, YARN-2408.4.patch > > > I’m proposing a new REST API for YARN which exposes a snapshot of the > Resource Requests that exist inside of the Scheduler. My motivation behind > this new feature is to allow external software to monitor the amount of > resources being requested to gain more insightful information into cluster > usage than is already provided. The API can also be used by external software > to detect a starved application and alert the appropriate users and/or sys > admin so that the problem may be remedied. > Here is the proposed API (a JSON counterpart is also available): > {code:xml} > > 7680 > 7 > > application_1412191664217_0001 > > appattempt_1412191664217_0001_01 > default > 6144 > 6 > 3 > > > 1024 > 1 > 6 > true > 20 > > localMachine > /default-rack > * > > > > > > ... > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157083#comment-14157083 ] Craig Welch commented on YARN-1198: --- And, again, I think something is up with Jenkins, the patch application issue doesn't look to have anything to do with the patch, and all the builds are red... > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, > YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, > YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2634) Test failure for TestClientRMTokens
[ https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157079#comment-14157079 ] Jian He commented on YARN-2634: --- [~djp], I took latest trunk and ran locally, it actually passes. Would you mind checking again ? thx > Test failure for TestClientRMTokens > --- > > Key: YARN-2634 > URL: https://issues.apache.org/jira/browse/YARN-2634 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Jian He >Priority: Blocker > > The test get failed as below: > {noformat} > --- > Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > --- > Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 22.693 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272) > testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 20.087 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283) > testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241) > testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.061 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261) > testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.07 sec <<< ERROR! > java.lang.NullPointerException: null > at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149) > > >1,1 Top > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157067#comment-14157067 ] Hadoop QA commented on YARN-1198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672614/YARN-1198.11.patch against trunk revision a56f3ec. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5240//console This message is automatically generated. > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, > YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, > YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1198: -- Attachment: YARN-1198.11.patch Attaching patch .11, this is based on .10 (nee .7), the preferred approach, with the a factoring change to decrease the impact - the HeadroomProvider is now limited to just the CapacityScheduler area / FiCaSchedulerApp. It's actually possible to remove the HeadroomProvider altogether in favor of adding more members to the scheduler app, but I think it actually looks better factored this way (the functional result would be the same). > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, > YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, > YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2639) TestClientToAMTokens should run with all types of schedulers
[ https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157012#comment-14157012 ] Hadoop QA commented on YARN-2639: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672593/YARN-2639-2.patch against trunk revision 29f5200. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5239//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5239//console This message is automatically generated. > TestClientToAMTokens should run with all types of schedulers > > > Key: YARN-2639 > URL: https://issues.apache.org/jira/browse/YARN-2639 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2639-1.patch, YARN-2639-2.patch > > > TestClientToAMTokens fails with FairScheduler now. We should let > TestClientToAMTokens run with all kinds of schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2639) TestClientToAMTokens should run with all types of schedulers
[ https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla resolved YARN-2639. Resolution: Duplicate Can we fix this also as part of YARN-2635. > TestClientToAMTokens should run with all types of schedulers > > > Key: YARN-2639 > URL: https://issues.apache.org/jira/browse/YARN-2639 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2639-1.patch, YARN-2639-2.patch > > > TestClientToAMTokens fails with FairScheduler now. We should let > TestClientToAMTokens run with all kinds of schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156963#comment-14156963 ] Karthik Kambatla commented on YARN-2635: Just saw YARN-2638 as well. On second thought, it might be better to club the two JIRAs and implement a base class for RM tests that run against all schedulers. And, schedulerType in these tests should probably be an enum so subclasses don't have to know the order. > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156951#comment-14156951 ] Jian He commented on YARN-2615: --- looks good, only few minor things: - {{ClientToAMTokenIdentifierForTest}}, the same code overrides from {{ClientToAMTokenIdentifier}} may be removed ? similarly for {{RMDelegationTokenIdentifierForTest}} - this code can be removed. {code} byte[] tokenIdentifierContent = token.getIdentifier(); ClientToAMTokenIdentifier tokenIdentifier = new ClientToAMTokenIdentifier(); DataInputBuffer dib = new DataInputBuffer(); dib.reset(tokenIdentifierContent, tokenIdentifierContent.length); tokenIdentifier.readFields(dib); {code} > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615-v2.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2634) Test failure for TestClientRMTokens
[ https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-2634: - Assignee: Jian He > Test failure for TestClientRMTokens > --- > > Key: YARN-2634 > URL: https://issues.apache.org/jira/browse/YARN-2634 > Project: Hadoop YARN > Issue Type: Test >Reporter: Junping Du >Assignee: Jian He >Priority: Blocker > > The test get failed as below: > {noformat} > --- > Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > --- > Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens > testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 22.693 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272) > testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 20.087 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283) > testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.031 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101) > at org.apache.hadoop.security.token.Token.renew(Token.java:377) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241) > testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.061 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319) > at > org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261) > testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens) > Time elapsed: 0.07 sec <<< ERROR! > java.lang.NullPointerException: null > at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684) > at > org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149) > > >1,1 Top > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2638) TestRM should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2638: --- Summary: TestRM should run with all schedulers (was: Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)) > TestRM should run with all schedulers > - > > Key: YARN-2638 > URL: https://issues.apache.org/jira/browse/YARN-2638 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2638-1.patch > > > TestRM fails when using FairScheduler or FifoScheduler. The failures not > shown in trunk as the trunk uses the default capacity scheduler. We need to > let TestRM run with all types of schedulers, to make sure any new change > wouldn't break any scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2635: --- Summary: TestRMRestart should run with all schedulers (was: TestRMRestart fails with FairScheduler) > TestRMRestart should run with all schedulers > > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156933#comment-14156933 ] Karthik Kambatla commented on YARN-2635: +1. Committing this. > TestRMRestart fails with FairScheduler > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156905#comment-14156905 ] Karthik Kambatla commented on YARN-2180: Looks mostly good, but for these minor comments: # App-checker and the store implementations aren't related: ## the app-checker config should be appended to SHARED_CACHE_PREFIX and IN_MEMORY_STORE ## the variable names should be updated accordingly. ## InMemorySCMStore#createAppCheckerService should move to a util class - how about changing SharedCacheStructureUtil to SharedCacheUtil and adding this method there? # Can we create a follow-up blocker sub-task to revisit all the config names before we include sharedcache work in a release? > In-memory backing store for cache manager > - > > Key: YARN-2180 > URL: https://issues.apache.org/jira/browse/YARN-2180 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, > YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, > YARN-2180-trunk-v6.patch > > > Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2639) TestClientToAMTokens should run with all types of schedulers
[ https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2639: -- Attachment: YARN-2639-2.patch re-trigger the jenkins > TestClientToAMTokens should run with all types of schedulers > > > Key: YARN-2639 > URL: https://issues.apache.org/jira/browse/YARN-2639 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2639-1.patch, YARN-2639-2.patch > > > TestClientToAMTokens fails with FairScheduler now. We should let > TestClientToAMTokens run with all kinds of schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156892#comment-14156892 ] Siqi Li commented on YARN-1414: --- I just found out that this problem has been fixed in the trunk. I am going to close this jira > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156890#comment-14156890 ] Hadoop QA commented on YARN-2527: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672583/YARN-2527.patch against trunk revision 5e0b49d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5238//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5238//console This message is automatically generated. > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156888#comment-14156888 ] zhihai xu commented on YARN-2254: - thanks [~kasha] for reviewing and committing the patch. > TestRMWebServicesAppsModification should run against both CS and FS > --- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Fix For: 2.7.0 > > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156869#comment-14156869 ] Anubhav Dhoot commented on YARN-2624: - Thanks [~jlowe]! > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Fix For: 2.6.0 > > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
[ https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156867#comment-14156867 ] Ray Chiang commented on YARN-2638: -- This patch fixes the test for me. +1 > Let TestRM run with all types of schedulers (FIFO, Capacity, Fair) > -- > > Key: YARN-2638 > URL: https://issues.apache.org/jira/browse/YARN-2638 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2638-1.patch > > > TestRM fails when using FairScheduler or FifoScheduler. The failures not > shown in trunk as the trunk uses the default capacity scheduler. We need to > let TestRM run with all types of schedulers, to make sure any new change > wouldn't break any scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156841#comment-14156841 ] Hudson commented on YARN-2624: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6178 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6178/]) YARN-2624. Resource Localization fails on a cluster due to existing cache directories. Contributed by Anubhav Dhoot (jlowe: rev 29f520052e2b02f44979980e446acc0dccd96d54) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156835#comment-14156835 ] Karthik Kambatla commented on YARN-2624: Thanks for super-quick turnaround, Jason. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156836#comment-14156836 ] Ray Chiang commented on YARN-2635: -- Looks good to me. Ran cleanly in my tree. +1 > TestRMRestart fails with FairScheduler > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156824#comment-14156824 ] Jason Lowe commented on YARN-2624: -- Thanks for catching and fixing this, Anubhav! My apologies for missing this scenario in the original JIRA. +1 lgtm. Committing this. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156814#comment-14156814 ] Siqi Li commented on YARN-1414: --- Sure, I will submit a rebased patch shortly. > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated YARN-2527: --- Attachment: YARN-2527.patch > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156803#comment-14156803 ] Hudson commented on YARN-2254: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6177 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6177/]) YARN-2254. TestRMWebServicesAppsModification should run against both CS and FS. (Zhihai Xu via kasha) (kasha: rev 5e0b49da9caa53814581508e589f3704592cf335) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java > TestRMWebServicesAppsModification should run against both CS and FS > --- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Fix For: 2.7.0 > > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156805#comment-14156805 ] Karthik Kambatla commented on YARN-2624: The patch looks good to me. Would like input from someone more familiar with the NM restart code. [~jlowe], [~djp] - can either of you take a look? We would like to get this committed soon. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated YARN-2527: --- Attachment: (was: YARN-2527.patch) > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony updated YARN-2527: --- Attachment: YARN-2527.patch Thanks for the code, [~zjshen]. I have updated the patch based on the comment. > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156772#comment-14156772 ] Hudson commented on YARN-2617: -- FAILURE: Integrated in Hadoop-trunk-Commit #6176 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6176/]) YARN-2617. Fixed NM to not send duplicate container status whose app is not running. Contributed by Jun Gong (jianhe: rev 3ef1cf187faeb530e74606dd7113fd1ba08140d7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, > YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156761#comment-14156761 ] Jian He commented on YARN-2617: --- YARN-2640 seems resolved in YARN-1979 already. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, > YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156655#comment-14156655 ] Tsuyoshi OZAWA commented on YARN-2615: -- [~djp], currently, maybe the build about YARN looks broken on Jenkins CI. I faced same issue on YARN-2562. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615-v2.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156651#comment-14156651 ] Hadoop QA commented on YARN-2615: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672553/YARN-2615-v2.patch against trunk revision c7cee9b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5237//console This message is automatically generated. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615-v2.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual
[ https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156653#comment-14156653 ] Tsuyoshi OZAWA commented on YARN-1979: -- Thanks Vinod for the contribution and Junping for the review! > TestDirectoryCollection fails when the umask is unusual > --- > > Key: YARN-1979 > URL: https://issues.apache.org/jira/browse/YARN-1979 > Project: Hadoop YARN > Issue Type: Test >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Fix For: 2.7.0 > > Attachments: YARN-1979.2.patch, YARN-1979.txt > > > I've seen this fail in Windows where the default permissions are matching up > to 700. > {code} > --- > Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > --- > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection) > Time elapsed: 0.422 sec <<< FAILURE! > java.lang.AssertionError: local dir parent > Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA > not created with proper permissions expected: but was: > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106) > {code} > The clash is between testDiskSpaceUtilizationLimit() and > testCreateDirectories(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual
[ https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156647#comment-14156647 ] Hudson commented on YARN-1979: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6174 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6174/]) YARN-1979. TestDirectoryCollection fails when the umask is unusual. (Contributed by Vinod Kumar Vavilapalli and Tsuyoshi OZAWA) (junping_du: rev c7cee9b4551918d5d35bf4e9dc73982a050c73ba) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java > TestDirectoryCollection fails when the umask is unusual > --- > > Key: YARN-1979 > URL: https://issues.apache.org/jira/browse/YARN-1979 > Project: Hadoop YARN > Issue Type: Test >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Fix For: 2.7.0 > > Attachments: YARN-1979.2.patch, YARN-1979.txt > > > I've seen this fail in Windows where the default permissions are matching up > to 700. > {code} > --- > Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > --- > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection) > Time elapsed: 0.422 sec <<< FAILURE! > java.lang.AssertionError: local dir parent > Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA > not created with proper permissions expected: but was: > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106) > {code} > The clash is between testDiskSpaceUtilizationLimit() and > testCreateDirectories(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields
[ https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2615: - Attachment: YARN-2615-v2.patch In v2 patch, - Fix test failures and audit warning. - Add more tests for RMDelegationToken and TimelineDelegationToken. > ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended > fields > > > Key: YARN-2615 > URL: https://issues.apache.org/jira/browse/YARN-2615 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Attachments: YARN-2615-v2.patch, YARN-2615.patch > > > As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier > and DelegationTokenIdentifier should also be updated in the same way to allow > fields get extended in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual
[ https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156618#comment-14156618 ] Junping Du commented on YARN-1979: -- Thanks [~ozawa] for reminding me on this. Yes. I do forget this JIRA. +1. Committing it now. > TestDirectoryCollection fails when the umask is unusual > --- > > Key: YARN-1979 > URL: https://issues.apache.org/jira/browse/YARN-1979 > Project: Hadoop YARN > Issue Type: Test >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-1979.2.patch, YARN-1979.txt > > > I've seen this fail in Windows where the default permissions are matching up > to 700. > {code} > --- > Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > --- > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection) > Time elapsed: 0.422 sec <<< FAILURE! > java.lang.AssertionError: local dir parent > Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA > not created with proper permissions expected: but was: > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106) > {code} > The clash is between testDiskSpaceUtilizationLimit() and > testCreateDirectories(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156543#comment-14156543 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.6.0 > > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user
[ https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156552#comment-14156552 ] Hudson commented on YARN-1063: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) YARN-1063. Augmented Hadoop common winutils to have the ability to create containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java * hadoop-common-project/hadoop-common/src/main/winutils/chown.c * hadoop-common-project/hadoop-common/src/main/winutils/symlink.c * hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c * hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h * hadoop-common-project/hadoop-common/src/main/winutils/task.c * hadoop-yarn-project/CHANGES.txt > Winutils needs ability to create task as domain user > > > Key: YARN-1063 > URL: https://issues.apache.org/jira/browse/YARN-1063 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Environment: Windows >Reporter: Kyle Leckie >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, > YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch > > > h1. Summary: > Securing a Hadoop cluster requires constructing some form of security > boundary around the processes executed in YARN containers. Isolation based on > Windows user isolation seems most feasible. This approach is similar to the > approach taken by the existing LinuxContainerExecutor. The current patch to > winutils.exe adds the ability to create a process as a domain user. > h1. Alternative Methods considered: > h2. Process rights limited by security token restriction: > On Windows access decisions are made by examining the security token of a > process. It is possible to spawn a process with a restricted security token. > Any of the rights granted by SIDs of the default token may be restricted. It > is possible to see this in action by examining the security tone of a > sandboxed process launch be a web browser. Typically the launched process > will have a fully restricted token and need to access machine resources > through a dedicated broker process that enforces a custom security policy. > This broker process mechanism would break compatibility with the typical > Hadoop container process. The Container process must be able to utilize > standard function calls for disk and network IO. I performed some work > looking at ways to ACL the local files to the specific launched without > granting rights to other processes launched on the same machine but found > this to be an overly complex solution. > h2. Relying on APP containers: > Recent versions of windows have the ability to launch processes within an > isolated container. Application containers are supported for execution of > WinRT based executables. This method was ruled out due to the lack of > official support for standard windows APIs. At some point in the future > windows may support functionality similar to BSD jails or Linux containers, > at that point support for containers should be added. > h1. Create As User Feature Description: > h2. Usage: > A new sub command was added to the set of task commands. Here is the syntax: > winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE] > Some notes: > * The username specified is in the format of "user@domain" > * The machine executing this command must be joined to the domain of the user > specified > * The domain controller must allow the account executing the command access > to the user information. For this join the account to the predefined group > labeled "Pre-Windows 2000 Compatible Access" > * The account running the command must have several rights on the local > machine. These can be managed manually using secpol.msc: > ** "Act as part of the operating system" - SE_TCB_NAME > ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME > ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME > * The launched process will not have rights to the desktop so will not be > able to display any information or create UI. > * The launched process will have no network credentials. Any access of > network resources that requires domain authentication will fail. > h2. Implementation: > Winutils performs the following steps: > # Enable the required privileges for the current process. > # Register as a trusted process with the Local Security Authority (LSA). > # Create a new logon for the user passed on the command line. >
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156537#comment-14156537 ] Hudson commented on YARN-1972: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) YARN-1972. Added a secure container-executor for Windows. Contributed by Remus Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, > YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.
[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades
[ https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156524#comment-14156524 ] Hudson commented on YARN-2613: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java > NMClient doesn't have retries for supporting rolling-upgrades > - > > Key: YARN-2613 > URL: https://issues.apache.org/jira/browse/YARN-2613 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch > > > While NM is rolling upgrade, client should retry NM until it comes up. This > jira is to add a NMProxy (similar to RMProxy) with retry implementation to > support rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156528#comment-14156528 ] Hudson commented on YARN-2446: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/]) YARN-2446. Augmented Timeline service APIs to start taking in domains as a parameter while posting entities and events. Contributed by Zhijie Shen. (vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java > Using TimelineNamespace to shield the entities of a user > > > Key: YARN-2446 > URL: https://issues.apache.org/jira/browse/YARN-2446 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch > > > Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the > entities, preventing them from being accessed or affected by other users' > operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156490#comment-14156490 ] Wei Yan commented on YARN-2635: --- All tests passed locally. The TestDirectoryCollection failure looks related to YARN-1979, YARN-2640. > TestRMRestart fails with FairScheduler > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156471#comment-14156471 ] Hadoop QA commented on YARN-2562: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672538/YARN-2562.5-2.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5236//console This message is automatically generated. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual
[ https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156462#comment-14156462 ] Tsuyoshi OZAWA commented on YARN-1979: -- [~djp], do you mind taking a look at latest patch? Some users report same issue like YARN-2640. > TestDirectoryCollection fails when the umask is unusual > --- > > Key: YARN-1979 > URL: https://issues.apache.org/jira/browse/YARN-1979 > Project: Hadoop YARN > Issue Type: Test >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Attachments: YARN-1979.2.patch, YARN-1979.txt > > > I've seen this fail in Windows where the default permissions are matching up > to 700. > {code} > --- > Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > --- > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection) > Time elapsed: 0.422 sec <<< FAILURE! > java.lang.AssertionError: local dir parent > Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA > not created with proper permissions expected: but was: > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106) > {code} > The clash is between testDiskSpaceUtilizationLimit() and > testCreateDirectories(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed
[ https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156461#comment-14156461 ] Tsuyoshi OZAWA commented on YARN-2640: -- [~hex108], thanks for your contribution. Can we close this jira as duplicated issue of YARN-1979? > TestDirectoryCollection.testCreateDirectories failed > > > Key: YARN-2640 > URL: https://issues.apache.org/jira/browse/YARN-2640 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-2640.2.patch, YARN-2640.patch > > > When running test "mvn test -Dtest=TestDirectoryCollection", it failed: > {code} > Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection > testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection) > Time elapsed: 0.969 sec <<< FAILURE! > java.lang.AssertionError: local dir parent not created with proper > permissions expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104) > {code} > I found it was because testDiskSpaceUtilizationLimit ran before > testCreateDirectories when running test, then directory "dirA" was created in > test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried > to create "dirA" with specified permission, it found "dirA" has already been > there and it did nothing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2562: - Attachment: YARN-2562.5-2.patch > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156412#comment-14156412 ] Hudson commented on YARN-2446: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/]) YARN-2446. Augmented Timeline service APIs to start taking in domains as a parameter while posting entities and events. Contributed by Zhijie Shen. (vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java > Using TimelineNamespace to shield the entities of a user > > > Key: YARN-2446 > URL: https://issues.apache.org/jira/browse/YARN-2446 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch > > > Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the > entities, preventing them from being accessed or affected by other users' > operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades
[ https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156408#comment-14156408 ] Hudson commented on YARN-2613: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/]) YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * hadoop-yarn-project/CHANGES.txt > NMClient doesn't have retries for supporting rolling-upgrades > - > > Key: YARN-2613 > URL: https://issues.apache.org/jira/browse/YARN-2613 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch > > > While NM is rolling upgrade, client should retry NM until it comes up. This > jira is to add a NMProxy (similar to RMProxy) with retry implementation to > support rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156421#comment-14156421 ] Hudson commented on YARN-1972: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/]) YARN-1972. Added a secure container-executor for Windows. Contributed by Remus Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, > YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.had
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156427#comment-14156427 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Fix For: 2.6.0 > > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)