[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156127#comment-14156127 ] Zhijie Shen commented on YARN-2468: --- bq. I would like to check how many log files we can upload this time. If the number is 0, we can skip this time. And this check is also happened before LogKey.write(), otherwise, we will write key, but without value. I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the member variable. getPendingLogFilesToUploadForThisContainer can return this collection, and pass it into LogValue.write by adding one param of it. 2. IMHO, the following code can be improved. If we use iterator, we can delete the unnecessary element on the fly. {code} for (File file : candidates) { Matcher fileMatcher = filterPattern.matcher(file.getName()); if (fileMatcher.find()) { filteredFiles.add(file); } } if (!exclusion) { return filteredFiles; } else { candidates.removeAll(filteredFiles); return candidates; } {code} This block could be: {code} ... while(candidatesItr.hasNext()) { candidate = candidatesItr.next(); ... if ((not match && inclusive) || (match && exclusive)) { candidatesItr.remove() } } {code} 3. [~jianhe] mentioned to me before that the following condition is not always true to determine an AM container. Any idea? And it seems that we don't need shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set wasContainerSuccessful to true. Personally, if it's not trivial to identify the AM container, I prefer to write a TODO comment and leave it until we implement the log retention API. {code} if (containerId.getId() == 1) { return true; } {code} bq. It seems to be, let's validate this via a test-case. Is it addressed by {code} this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600); {code} Is it better to add a line of comment of the rationale behind the config? 5. Can the following code {code} Set finishedContainers = new HashSet(); for (ContainerId id : pendingContainerInThisCycle) { finishedContainers.add(id); } {code} be simplified as {code} Set finishedContainers = new HashSet(pendingContainerInThisCycle); {code} > Log handling for LRS > > > Key: YARN-2468 > URL: https://issues.apache.org/jira/browse/YARN-2468 > Project: Hadoop YARN > Issue Type: Sub-task > Components: log-aggregation, nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, > YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, > YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, > YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, > YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, > YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, > YARN-2468.9.1.patch, YARN-2468.9.patch > > > Currently, when application is finished, NM will start to do the log > aggregation. But for Long running service applications, this is not ideal. > The problems we have are: > 1) LRS applications are expected to run for a long time (weeks, months). > 2) Currently, all the container logs (from one NM) will be written into a > single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156104#comment-14156104 ] zhihai xu commented on YARN-2254: - The release audit warning is not related to my change. {code} !? /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs/.gitattributes Lines that start with ? in the release audit report indicate files that do not have an Apache license header. {code} > TestRMWebServicesAppsModification should run against both CS and FS > --- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156101#comment-14156101 ] Hadoop QA commented on YARN-2562: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672497/YARN-2562.5.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5232//console This message is automatically generated. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2562: - Attachment: YARN-2562.5.patch Rebased on trunk. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch, YARN-2562.5.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156079#comment-14156079 ] Anubhav Dhoot commented on YARN-2624: - The fix addresses the scenario moving from pre node manager recovery to turning on node manager recovery. As per YARN-1338 the directories are not cleaned up inorder to preserve running containers. But uniqueNumberGenerator will not know about preexisting directories which were normally deleted on NM startup and are unknown to recovery enabled NM. In this case we still want directory cleanup to happen. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156073#comment-14156073 ] Anubhav Dhoot commented on YARN-2624: - Failure seems unrelated to changes and does not repro locally > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156070#comment-14156070 ] Hadoop QA commented on YARN-2562: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672494/YARN-2562.4.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5231//console This message is automatically generated. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2562: - Attachment: YARN-2562.4.patch [~jianhe], good catch. Fixed the comment. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, > YARN-2562.4.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156020#comment-14156020 ] Sandy Ryza commented on YARN-1414: -- [~jrottinghuis] I will take a look. [~l201514] mind rebasing so that the patch will apply? > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
[ https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156019#comment-14156019 ] Hadoop QA commented on YARN-2638: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672476/YARN-2638-1.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5230//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5230//console This message is automatically generated. > Let TestRM run with all types of schedulers (FIFO, Capacity, Fair) > -- > > Key: YARN-2638 > URL: https://issues.apache.org/jira/browse/YARN-2638 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2638-1.patch > > > TestRM fails when using FairScheduler or FifoScheduler. The failures not > shown in trunk as the trunk uses the default capacity scheduler. We need to > let TestRM run with all types of schedulers, to make sure any new change > wouldn't break any scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156000#comment-14156000 ] Jian He commented on YARN-2562: --- thanks for updating, one minor thing: - container_e*epoch*_\*clusterTimestamp*_\*attemptId*_\*appId*_\*containerId*, it should be appId followed by attemptId > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155992#comment-14155992 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672474/YARN-2617.6.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5229//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5229//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, > YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
[ https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2638: -- Attachment: YARN-2638-1.patch > Let TestRM run with all types of schedulers (FIFO, Capacity, Fair) > -- > > Key: YARN-2638 > URL: https://issues.apache.org/jira/browse/YARN-2638 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2638-1.patch > > > TestRM fails when using FairScheduler or FifoScheduler. The failures not > shown in trunk as the trunk uses the default capacity scheduler. We need to > let TestRM run with all types of schedulers, to make sure any new change > wouldn't break any scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155960#comment-14155960 ] Jun Gong commented on YARN-2617: It seems that there is something wrong with Jenkins. >From the console output >https://builds.apache.org/job/PreCommit-YARN-Build/5227//console, it seems to >apply a wrong patch. Going to apply patch with: /usr/bin/patch -p0 patching file hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, > YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2617: -- Attachment: YARN-2617.6.patch > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, > YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155954#comment-14155954 ] Jian He commented on YARN-2628: --- looks good, +1 > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155949#comment-14155949 ] Tsuyoshi OZAWA commented on YARN-2562: -- [~jianhe], [~vinodkv], could you check the latest patch? > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155948#comment-14155948 ] Hadoop QA commented on YARN-2562: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672462/YARN-2562.3.patch against trunk revision 0708827. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5226//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5226//console This message is automatically generated. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2639) TestClientToAMTokens should run with all types of schedulers
[ https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155946#comment-14155946 ] Hadoop QA commented on YARN-2639: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672470/YARN-2639-1.patch against trunk revision 9e40de6. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5228//console This message is automatically generated. > TestClientToAMTokens should run with all types of schedulers > > > Key: YARN-2639 > URL: https://issues.apache.org/jira/browse/YARN-2639 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2639-1.patch > > > TestClientToAMTokens fails with FairScheduler now. We should let > TestClientToAMTokens run with all kinds of schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155941#comment-14155941 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672468/YARN-2617.5.patch against trunk revision 9e40de6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5227//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2639) TestClientToAMTokens should run with all types of schedulers
[ https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2639: -- Attachment: YARN-2639-1.patch > TestClientToAMTokens should run with all types of schedulers > > > Key: YARN-2639 > URL: https://issues.apache.org/jira/browse/YARN-2639 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2639-1.patch > > > TestClientToAMTokens fails with FairScheduler now. We should let > TestClientToAMTokens run with all kinds of schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155930#comment-14155930 ] Vinod Kumar Vavilapalli commented on YARN-2446: --- Merged this into branch-2.6 also. > Using TimelineNamespace to shield the entities of a user > > > Key: YARN-2446 > URL: https://issues.apache.org/jira/browse/YARN-2446 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch > > > Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the > entities, preventing them from being accessed or affected by other users' > operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2639) TestClientToAMTokens should run with all types of schedulers
Wei Yan created YARN-2639: - Summary: TestClientToAMTokens should run with all types of schedulers Key: YARN-2639 URL: https://issues.apache.org/jira/browse/YARN-2639 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan TestClientToAMTokens fails with FairScheduler now. We should let TestClientToAMTokens run with all kinds of schedulers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155926#comment-14155926 ] Hudson commented on YARN-2446: -- FAILURE: Integrated in Hadoop-trunk-Commit #6173 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6173/]) YARN-2446. Augmented Timeline service APIs to start taking in domains as a parameter while posting entities and events. Contributed by Zhijie Shen. (vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java > Using TimelineNamespace to shield the entities of a user > > > Key: YARN-2446 > URL: https://issues.apache.org/jira/browse/YARN-2446 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: 2.6.0 > > Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch > > > Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the > entities, preventing them from being accessed or affected by other users' > operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2617: -- Attachment: YARN-2617.5.patch same patch uploaded again > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155913#comment-14155913 ] Hadoop QA commented on YARN-2624: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672461/YARN-2624.001.patch against trunk revision 0708827. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5224//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5224//console This message is automatically generated. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155912#comment-14155912 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672458/YARN-2617.5.patch against trunk revision 0708827. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5225//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2562: - Attachment: YARN-2562.3.patch > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
[ https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2638: -- Description: TestRM fails when using FairScheduler or FifoScheduler. The failures not shown in trunk as the trunk uses the default capacity scheduler. We need to let TestRM run with all types of schedulers, to make sure any new change wouldn't break any scheduler. (was: TestRM fails when using FairScheduler. The failures not shown in trunk as the trunk uses the default capacity scheduler. We need to let TestRM run with all types of schedulers, to make sure any new change wouldn't break any scheduler.) > Let TestRM run with all types of schedulers (FIFO, Capacity, Fair) > -- > > Key: YARN-2638 > URL: https://issues.apache.org/jira/browse/YARN-2638 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > > TestRM fails when using FairScheduler or FifoScheduler. The failures not > shown in trunk as the trunk uses the default capacity scheduler. We need to > let TestRM run with all types of schedulers, to make sure any new change > wouldn't break any scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155898#comment-14155898 ] Vinod Kumar Vavilapalli commented on YARN-2446: --- This looks good, +1. Checking this in. > Using TimelineNamespace to shield the entities of a user > > > Key: YARN-2446 > URL: https://issues.apache.org/jira/browse/YARN-2446 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch > > > Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the > entities, preventing them from being accessed or affected by other users' > operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155893#comment-14155893 ] Hadoop QA commented on YARN-2635: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672460/YARN-2635-1.patch against trunk revision 0708827. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5223//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5223//console This message is automatically generated. > TestRMRestart fails with FairScheduler > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155880#comment-14155880 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672458/YARN-2617.5.patch against trunk revision 0708827. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5222//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5222//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155868#comment-14155868 ] Tsuyoshi OZAWA commented on YARN-2562: -- My concern at first was that AppMaster won't work with new RM because of the change of containerId's format. However, we can change it since the protocol between AM and RM is changed and old AppMaster won't work in any case. Then it's better to use the format Vinod mentioned at first. Updating a patch soon. > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
[ https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2638: -- Description: TestRM fails when using FairScheduler. The failures not shown in trunk as the trunk uses the default capacity scheduler. We need to let TestRM run with all types of schedulers, to make sure any new change wouldn't break any scheduler. > Let TestRM run with all types of schedulers (FIFO, Capacity, Fair) > -- > > Key: YARN-2638 > URL: https://issues.apache.org/jira/browse/YARN-2638 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > > TestRM fails when using FairScheduler. The failures not shown in trunk as the > trunk uses the default capacity scheduler. We need to let TestRM run with all > types of schedulers, to make sure any new change wouldn't break any scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
Wei Yan created YARN-2638: - Summary: Let TestRM run with all types of schedulers (FIFO, Capacity, Fair) Key: YARN-2638 URL: https://issues.apache.org/jira/browse/YARN-2638 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2624: Attachment: YARN-2624.001.patch No apparent failure in jenkins output. Uploading it again > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch, YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155842#comment-14155842 ] Vinod Kumar Vavilapalli commented on YARN-1972: --- BTW, the new test TestContainerExecutor from YARN-443 was originally missed in branch-2, I committed it here. > Implement secure Windows Container Executor > --- > > Key: YARN-1972 > URL: https://issues.apache.org/jira/browse/YARN-1972 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Labels: security, windows > Fix For: 2.6.0 > > Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, > YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, > YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch > > > h1. Windows Secure Container Executor (WCE) > YARN-1063 adds the necessary infrasturcture to launch a process as a domain > user as a solution for the problem of having a security boundary between > processes executed in YARN containers and the Hadoop services. The WCE is a > container executor that leverages the winutils capabilities introduced in > YARN-1063 and launches containers as an OS process running as the job > submitter user. A description of the S4U infrastructure used by YARN-1063 > alternatives considered can be read on that JIRA. > The WCE is based on the DefaultContainerExecutor. It relies on the DCE to > drive the flow of execution, but it overwrrides some emthods to the effect of: > * change the DCE created user cache directories to be owned by the job user > and by the nodemanager group. > * changes the actual container run command to use the 'createAsUser' command > of winutils task instead of 'create' > * runs the localization as standalone process instead of an in-process Java > method call. This in turn relies on the winutil createAsUser feature to run > the localization as the job user. > > When compared to LinuxContainerExecutor (LCE), the WCE has some minor > differences: > * it does no delegate the creation of the user cache directories to the > native implementation. > * it does no require special handling to be able to delete user files > The approach on the WCE came from a practical trial-and-error approach. I had > to iron out some issues around the Windows script shell limitations (command > line length) to get it to work, the biggest issue being the huge CLASSPATH > that is commonplace in Hadoop environment container executions. The job > container itself is already dealing with this via a so called 'classpath > jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch > as a separate container the same issue had to be resolved and I used the same > 'classpath jar' approach. > h2. Deployment Requirements > To use the WCE one needs to set the > `yarn.nodemanager.container-executor.class` to > `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` > and set the `yarn.nodemanager.windows-secure-container-executor.group` to a > Windows security group name that is the nodemanager service principal is a > member of (equivalent of LCE > `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE > does not require any configuration outside of the Hadoop own's yar-site.xml. > For WCE to work the nodemanager must run as a service principal that is > member of the local Administrators group or LocalSystem. this is derived from > the need to invoke LoadUserProfile API which mention these requirements in > the specifications. This is in addition to the SE_TCB privilege mentioned in > YARN-1063, but this requirement will automatically imply that the SE_TCB > privilege is held by the nodemanager. For the Linux speakers in the audience, > the requirement is basically to run NM as root. > h2. Dedicated high privilege Service > Due to the high privilege required by the WCE we had discussed the need to > isolate the high privilege operations into a separate process, an 'executor' > service that is solely responsible to start the containers (incloding the > localizer). The NM would have to authenticate, authorize and communicate with > this service via an IPC mechanism and use this service to launch the > containers. I still believe we'll end up deploying such a service, but the > effort to onboard such a new platfrom specific new service on the project are > not trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-443) allow OS scheduling priority of NM to be different than the containers it launches
[ https://issues.apache.org/jira/browse/YARN-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155840#comment-14155840 ] Vinod Kumar Vavilapalli commented on YARN-443: -- The new test TestContainerExecutor was missed in branch-2, I committed it as part of YARN-1972. > allow OS scheduling priority of NM to be different than the containers it > launches > -- > > Key: YARN-443 > URL: https://issues.apache.org/jira/browse/YARN-443 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 0.23.7, 2.0.4-alpha > > Attachments: YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, > YARN-443-branch-0.23.patch, YARN-443-branch-0.23.patch, > YARN-443-branch-2.patch, YARN-443-branch-2.patch, YARN-443-branch-2.patch, > YARN-443.patch, YARN-443.patch, YARN-443.patch, YARN-443.patch, > YARN-443.patch, YARN-443.patch, YARN-443.patch > > > It would be nice if we could have the nodemanager run at a different OS > scheduling priority than the containers so that you can still communicate > with the nodemanager if the containers out of control. > On linux we could launch the nodemanager at a higher priority, but then all > the containers it launches would also be at that higher priority, so we need > a way for the container executor to launch them at a lower priority. > I'm not sure how this applies to windows if at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2635) TestRMRestart fails with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2635: -- Attachment: YARN-2635-1.patch Post a patch which let TestRMRestart run with all types of schedulers, and fix the failures related to FS. > TestRMRestart fails with FairScheduler > -- > > Key: YARN-2635 > URL: https://issues.apache.org/jira/browse/YARN-2635 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wei Yan >Assignee: Wei Yan > Attachments: YARN-2635-1.patch > > > If we change the scheduler from Capacity Scheduler to Fair Scheduler, the > TestRMRestart would fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182
[ https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155828#comment-14155828 ] Jian He commented on YARN-2562: --- bq. A number at the end for me always pointed to the container-id I think this is a point. and logically, epochId precedes applicationId, [~ozawa], your opinion? > ContainerId@toString() is unreadable for epoch >0 after YARN-2182 > - > > Key: YARN-2562 > URL: https://issues.apache.org/jira/browse/YARN-2562 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-2562.1.patch, YARN-2562.2.patch > > > ContainerID string format is unreadable for RMs that restarted at least once > (epoch > 0) after YARN-2182. For e.g, > container_1410901177871_0001_01_05_17. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1391) Lost node list should be identify by NodeId
[ https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155821#comment-14155821 ] Hadoop QA commented on YARN-1391: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672452/YARN-1391.v2.patch against trunk revision 8dfe54f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5220//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5220//console This message is automatically generated. > Lost node list should be identify by NodeId > --- > > Key: YARN-1391 > URL: https://issues.apache.org/jira/browse/YARN-1391 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1391.v1.patch, YARN-1391.v2.patch > > > in case of multiple node managers on a single machine. each of them should be > identified by NodeId, which is more unique than just host name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155820#comment-14155820 ] Hadoop QA commented on YARN-2624: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672456/YARN-2624.001.patch against trunk revision 8dfe54f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5221//console This message is automatically generated. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2617: -- Attachment: YARN-2617.5.patch Upload same patch, not sure why jenkins report eclipse failure > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades
[ https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155817#comment-14155817 ] Hudson commented on YARN-2613: -- FAILURE: Integrated in Hadoop-trunk-Commit #6172 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6172/]) YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java > NMClient doesn't have retries for supporting rolling-upgrades > - > > Key: YARN-2613 > URL: https://issues.apache.org/jira/browse/YARN-2613 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch > > > While NM is rolling upgrade, client should retry NM until it comes up. This > jira is to add a NMProxy (similar to RMProxy) with retry implementation to > support rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155810#comment-14155810 ] Jian He commented on YARN-2591: --- looked at the patch, maybe create a new exception type, instead of catching the exception msg ? > AHSWebServices should return FORBIDDEN(403) if the request user doesn't have > access to the history data > --- > > Key: YARN-2591 > URL: https://issues.apache.org/jira/browse/YARN-2591 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 3.0.0, 2.6.0 >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2591.1.patch > > > AHSWebServices should return FORBIDDEN(403) if the request user doesn't have > access to the history data. Currently, it is going to return > INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155806#comment-14155806 ] Hadoop QA commented on YARN-2628: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672437/apache-yarn-2628.1.patch against trunk revision 52bbe0f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5214//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5214//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5214//console This message is automatically generated. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories
[ https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2624: Attachment: YARN-2624.001.patch Attaching a patch that cleans up the local resource cache directories when the statestore is built up first time. That would take care of cleanup of leftover directories when moving from non-work preserving to work preserving in most cases. There can still be failures in NM in between creating state and running the cleanup. > Resource Localization fails on a cluster due to existing cache directories > -- > > Key: YARN-2624 > URL: https://issues.apache.org/jira/browse/YARN-2624 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Blocker > Attachments: YARN-2624.001.patch > > > We have found resource localization fails on a cluster with following error > in certain cases. > {noformat} > INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Failed to download rsrc { { > hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml, > 1412027745352, FILE, null > },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING} > java.io.IOException: Rename cannot overwrite non empty destination directory > /data/yarn/nm/filecache/27 > at > org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716) > at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228) > at > org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659) > at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155766#comment-14155766 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch against trunk revision 52bbe0f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5218//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5218//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155764#comment-14155764 ] Craig Welch commented on YARN-1198: --- [~john.jian.fang] I look a look at implementing the change with the tweaked .7 approach per your suggestion above and it seemed to just be trading some complexities for others, so I set it aside and I think the current .7 approach is as good as any. I uploaded a .10 patch which is the .7 fixed to apply cleanly to current trunk (.7 no longer quite does for me). I took a look at incorporating [YARN-1857] into this change but chose not to, as I think they should be committed independently. The .10 (.7) patch factors the change for [YARN-1857] up into a different method, getHeadroom(), if you replace it with the below: {code} private Resource getHeadroom(User user, Resource queueMaxCap, Resource clusterResource, Resource userLimit) { Resource headroom = Resources.min(resourceCalculator, clusterResource, Resources.subtract( Resources.min(resourceCalculator, clusterResource, userLimit, queueMaxCap), user.getConsumedResources()), Resources.subtract(queueMaxCap, usedResources)); return headroom; } {code} then you should have the combined logic. Note, the LeafQueue tests will then not all pass, I believe because results changed when that patch was applied - I've not before tried the two in combination, assuming we would apply one at a time, and then address the impact on the other. > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, > YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155767#comment-14155767 ] Craig Welch commented on YARN-1198: --- The Jenkins failures do not actually seem to have anything to do with the patch, the output is complaining about being behind trunk... > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, > YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1391) Lost node list should be identify by NodeId
[ https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-1391: -- Attachment: YARN-1391.v2.patch > Lost node list should be identify by NodeId > --- > > Key: YARN-1391 > URL: https://issues.apache.org/jira/browse/YARN-1391 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1391.v1.patch, YARN-1391.v2.patch > > > in case of multiple node managers on a single machine. each of them should be > identified by NodeId, which is more unique than just host name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155757#comment-14155757 ] Hadoop QA commented on YARN-1198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672450/YARN-1198.10.patch against trunk revision 8dfe54f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5219//console This message is automatically generated. > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, > YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1198: -- Attachment: YARN-1198.10.patch And again, this time, with the additional files, .10 (nee .9, .7) > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.10.patch, > YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, > YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155747#comment-14155747 ] Varun Vasudev commented on YARN-90: --- The release audit warning is unrelated to the patch. > NodeManager should identify failed disks becoming good back again > - > > Key: YARN-90 > URL: https://issues.apache.org/jira/browse/YARN-90 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ravi Gummadi >Assignee: Varun Vasudev > Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, > YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, > apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, > apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, > apache-yarn-90.8.patch > > > MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes > down, it is marked as failed forever. To reuse that disk (after it becomes > good), NodeManager needs restart. This JIRA is to improve NodeManager to > reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155744#comment-14155744 ] Zhijie Shen commented on YARN-2527: --- Thanks for be patient about the comment. How about doing the following? {code} AccessControlList applicationACLInMap = acls.get(applicationAccessType); if (applicationACLInMap) { applicationACL = applicationACLInMap; else { if (LOG.isDebugEnabled()) { LOG.debug("ACL not found for access-type " + applicationAccessType + " for application " + applicationId + " owned by " + applicationOwner + ". Using default [" + YarnConfiguration.DEFAULT_YARN_APP_ACL + "]"); } applicationACL = DEFAULT_YARN_APP_ACL; } {code} > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155745#comment-14155745 ] Hadoop QA commented on YARN-90: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672436/apache-yarn-90.8.patch against trunk revision dd1b8f2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5213//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5213//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5213//console This message is automatically generated. > NodeManager should identify failed disks becoming good back again > - > > Key: YARN-90 > URL: https://issues.apache.org/jira/browse/YARN-90 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ravi Gummadi >Assignee: Varun Vasudev > Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, > YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, > apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, > apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, > apache-yarn-90.8.patch > > > MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes > down, it is marked as failed forever. To reuse that disk (after it becomes > good), NodeManager needs restart. This JIRA is to improve NodeManager to > reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1714) Per user and per queue view in YARN RM
[ https://issues.apache.org/jira/browse/YARN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155735#comment-14155735 ] Hadoop QA commented on YARN-1714: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628368/YARN-1714.v3.patch against trunk revision 52bbe0f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5215//console This message is automatically generated. > Per user and per queue view in YARN RM > -- > > Key: YARN-1714 > URL: https://issues.apache.org/jira/browse/YARN-1714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-1714.v1.patch, YARN-1714.v2.patch, > YARN-1714.v3.patch > > > ResourceManager exposes either one or all jobs via WebUI. It would be good to > have filter for user so that see only their jobs. > Provide rest style url to access only user specified queue or user apps. > For instance, > http://hadoop-example.com:50030/cluster/user/toto > displays apps owned by toto > http://hadoop-example.com:50030/cluster/user/toto,glinda > displays apps owned by toto and glinda > http://hadoop-example.com:50030/cluster/queue/root.queue1 >displays apps in root.queue1 > http://hadoop-example.com:50030/cluster/queue/root.queue1,root.queue2 > displays apps in root.queue1 and root.queue2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155727#comment-14155727 ] Hadoop QA commented on YARN-1198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672445/YARN-1198.9.patch against trunk revision 52bbe0f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5217//console This message is automatically generated. > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, > YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, > YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1198: -- Attachment: YARN-1198.9.patch Updated version of .7 patch to current trunk (as .7 now fails to fully apply) > Capacity Scheduler headroom calculation does not work as expected > - > > Key: YARN-1198 > URL: https://issues.apache.org/jira/browse/YARN-1198 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Craig Welch > Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, > YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, > YARN-1198.8.patch, YARN-1198.9.patch > > > Today headroom calculation (for the app) takes place only when > * New node is added/removed from the cluster > * New container is getting assigned to the application. > However there are potentially lot of situations which are not considered for > this calculation > * If a container finishes then headroom for that application will change and > should be notified to the AM accordingly. > * If a single user has submitted multiple applications (app1 and app2) to the > same queue then > ** If app1's container finishes then not only app1's but also app2's AM > should be notified about the change in headroom. > ** Similarly if a container is assigned to any applications app1/app2 then > both AM should be notified about their headroom. > ** To simplify the whole communication process it is ideal to keep headroom > per User per LeafQueue so that everyone gets the same picture (apps belonging > to same user and submitted in same queue). > * If a new user submits an application to the queue then all applications > submitted by all users in that queue should be notified of the headroom > change. > * Also today headroom is an absolute number ( I think it should be normalized > but then this is going to be not backward compatible..) > * Also when admin user refreshes queue headroom has to be updated. > These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155715#comment-14155715 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch against trunk revision dd1b8f2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5212//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5212//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2254) TestRMWebServicesAppsModification should run against both Capacity and FairSchedulers
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2254: --- Summary: TestRMWebServicesAppsModification should run against both Capacity and FairSchedulers (was: change TestRMWebServicesAppsModification to support FairScheduler.) > TestRMWebServicesAppsModification should run against both Capacity and > FairSchedulers > - > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2254: --- Summary: TestRMWebServicesAppsModification should run against both CS and FS (was: TestRMWebServicesAppsModification should run against both Capacity and FairSchedulers) > TestRMWebServicesAppsModification should run against both CS and FS > --- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155708#comment-14155708 ] Hadoop QA commented on YARN-2312: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672397/YARN-2312.2-3.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestMiniMRBringup org.apache.hadoop.mapred.TestClusterMapReduceTestCase org.apache.hadoop.mapred.TestMRIntermediateDataEncryption org.apache.hadoop.mapred.pipes.TestPipeApplication The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5207//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5207//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5207//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5207//console This message is automatically generated. > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155707#comment-14155707 ] Hadoop QA commented on YARN-1879: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672418/YARN-1879.18.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5210//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5210//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5210//console This message is automatically generated. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, > YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1391) Lost node list should be identify by NodeId
[ https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155705#comment-14155705 ] Hadoop QA commented on YARN-1391: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618147/YARN-1391.v1.patch against trunk revision 52bbe0f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5216//console This message is automatically generated. > Lost node list should be identify by NodeId > --- > > Key: YARN-1391 > URL: https://issues.apache.org/jira/browse/YARN-1391 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-1391.v1.patch > > > in case of multiple node managers on a single machine. each of them should be > identified by NodeId, which is more unique than just host name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1714) Per user and per queue view in YARN RM
[ https://issues.apache.org/jira/browse/YARN-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155700#comment-14155700 ] Siqi Li commented on YARN-1714: --- I am all for making the webUI more interactive. And I checked RMWebServices#getApps, it will give a different format of applications and statuses. You can't drill in the apps and see how everything is going. However, some users want the same UI as in the RM, so this patch will provide them with a simple url that displays only their job or queues. > Per user and per queue view in YARN RM > -- > > Key: YARN-1714 > URL: https://issues.apache.org/jira/browse/YARN-1714 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Attachments: YARN-1714.v1.patch, YARN-1714.v2.patch, > YARN-1714.v3.patch > > > ResourceManager exposes either one or all jobs via WebUI. It would be good to > have filter for user so that see only their jobs. > Provide rest style url to access only user specified queue or user apps. > For instance, > http://hadoop-example.com:50030/cluster/user/toto > displays apps owned by toto > http://hadoop-example.com:50030/cluster/user/toto,glinda > displays apps owned by toto and glinda > http://hadoop-example.com:50030/cluster/queue/root.queue1 >displays apps in root.queue1 > http://hadoop-example.com:50030/cluster/queue/root.queue1,root.queue2 > displays apps in root.queue1 and root.queue2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1715) Per queue view in RM is not implemented correctly
[ https://issues.apache.org/jira/browse/YARN-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li resolved YARN-1715. --- Resolution: Duplicate > Per queue view in RM is not implemented correctly > - > > Key: YARN-1715 > URL: https://issues.apache.org/jira/browse/YARN-1715 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > > For now, per queue view in YARN RM has not yet implemented. > in RmController.java it only set page title for per queue page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155702#comment-14155702 ] Hudson commented on YARN-2630: -- FAILURE: Integrated in Hadoop-trunk-Commit #6170 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6170/]) YARN-2630. Prevented previous AM container status from being acquired by the current restarted AM. Contributed by Jian He. (zjshen: rev 52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2628: Attachment: apache-yarn-2628.1.patch Uploaded a patch to address [~jianhe]'s comments. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager
[ https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155679#comment-14155679 ] Benoy Antony commented on YARN-2527: Thanks for the review [~zjshen]. Do you mean to change as below ? {code} if (acls.get(applicationAccessType) != null) { applicationACL = acls.get(applicationAccessType) ; else { if (LOG.isDebugEnabled()) { LOG.debug("ACL not found for access-type " + applicationAccessType + " for application " + applicationId + " owned by " + applicationOwner + ". Using default [" + YarnConfiguration.DEFAULT_YARN_APP_ACL + "]"); } applicationACL = DEFAULT_YARN_APP_ACL; } {code} The only downside to suggested approach is that it will involve two lookups in _acls_ _HashMap_ whereas the current apprach in the above comment involves only one lookup. > NPE in ApplicationACLsManager > - > > Key: YARN-2527 > URL: https://issues.apache.org/jira/browse/YARN-2527 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: YARN-2527.patch, YARN-2527.patch > > > NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error. > The relevant stacktrace snippet from the ResourceManager logs is as below > {code} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) > at > org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) > at org.apache.hadoop.yarn.webapp.View.render(View.java:235) > {code} > This issue was reported by [~miguenther]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again
[ https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-90: -- Attachment: apache-yarn-90.8.patch Thanks for the review [~mingma]! {quote} 1. What if a dir is transitioned from DISK_FULL state to OTHER state? DirectoryCollection.checkDirs doesn't seem to update errorDirs and fullDirs properly. We can use some state machine for each dir and make sure each transition is covered. {quote} Fixed. I've re-written the checkDir function but I haven't used a state machine. Can you please review? {quote} 2. DISK_FULL state is counted toward the error disk threshold by LocalDirsHandlerService.areDisksHealthy; later RM could mark NM NODE_UNUSABLE. If we believe DISK_FULL is mostly temporary issue, should we consider disks are healthy if disks only stay in DISK_FULL for some short period of time? {quote} The issue here is that if a disk is full, we can't launch new containers on it. If we can't launch containers, the RM should consider the node is unhealthy. Once the disk is cleaned up, the RM will assign containers to it. {quote} 3. In AppLogAggregatorImpl.java, "(Path[]) localAppLogDirs.toArray(new Path\[localAppLogDirs.size()]).". It seems the (Path[]) cast isn't necessary. {quote} Fixed. {quote} 4. What is the intention of numFailures? Method getNumFailures isn't used. {quote} This is a carry over function - it existed as part of the existing implementation. {quote} 5. Nit: It is better to expand "import java.util.*;" in DirectoryCollection.java and LocalDirsHandlerService.java. {quote} Fixed. > NodeManager should identify failed disks becoming good back again > - > > Key: YARN-90 > URL: https://issues.apache.org/jira/browse/YARN-90 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Ravi Gummadi >Assignee: Varun Vasudev > Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, > YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, > apache-yarn-90.2.patch, apache-yarn-90.3.patch, apache-yarn-90.4.patch, > apache-yarn-90.5.patch, apache-yarn-90.6.patch, apache-yarn-90.7.patch, > apache-yarn-90.8.patch > > > MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes > down, it is marked as failed forever. To reuse that disk (after it becomes > good), NodeManager needs restart. This JIRA is to improve NodeManager to > reuse good disks(which could be bad some time back). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155669#comment-14155669 ] Hadoop QA commented on YARN-2254: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672416/YARN-2254.004.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5209//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5209//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5209//console This message is automatically generated. > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155664#comment-14155664 ] Hadoop QA commented on YARN-913: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672406/YARN-913-016.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 36 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1266 javac compiler warnings (more than the trunk's current 1265 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5208//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5208//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5208//console This message is automatically generated. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation
[ https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2637: - Summary: maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation (was: maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation) > maximum-am-resource-percent could be violated when resource of AM is > > minimumAllocation > > > Key: YARN-2637 > URL: https://issues.apache.org/jira/browse/YARN-2637 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Wangda Tan >Priority: Critical > > Currently, number of AM in leaf queue will be calculated in following way: > {code} > max_am_resource = queue_max_capacity * maximum_am_resource_percent > #max_am_number = max_am_resource / minimum_allocation > #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor > {code} > And when submit new application to RM, it will check if an app can be > activated in following way: > {code} > for (Iterator i=pendingApplications.iterator(); > i.hasNext(); ) { > FiCaSchedulerApp application = i.next(); > > // Check queue limit > if (getNumActiveApplications() >= getMaximumActiveApplications()) { > break; > } > > // Check user limit > User user = getUser(application.getUser()); > if (user.getActiveApplications() < > getMaximumActiveApplicationsPerUser()) { > user.activateApplication(); > activeApplications.add(application); > i.remove(); > LOG.info("Application " + application.getApplicationId() + > " from user: " + application.getUser() + > " activated in queue: " + getQueueName()); > } > } > {code} > An example is, > If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum > resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be > launched is 200, and if user uses 5M for each AM (> minimum_allocation). All > apps can still be activated, and it will occupy all resource of a queue > instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2637) maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation
Wangda Tan created YARN-2637: Summary: maximum-am-resource-percent will be violated when resource of AM is > minimumAllocation Key: YARN-2637 URL: https://issues.apache.org/jira/browse/YARN-2637 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.6.0 Reporter: Wangda Tan Priority: Critical Currently, number of AM in leaf queue will be calculated in following way: {code} max_am_resource = queue_max_capacity * maximum_am_resource_percent #max_am_number = max_am_resource / minimum_allocation #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor {code} And when submit new application to RM, it will check if an app can be activated in following way: {code} for (Iterator i=pendingApplications.iterator(); i.hasNext(); ) { FiCaSchedulerApp application = i.next(); // Check queue limit if (getNumActiveApplications() >= getMaximumActiveApplications()) { break; } // Check user limit User user = getUser(application.getUser()); if (user.getActiveApplications() < getMaximumActiveApplicationsPerUser()) { user.activateApplication(); activeApplications.add(application); i.remove(); LOG.info("Application " + application.getApplicationId() + " from user: " + application.getUser() + " activated in queue: " + getQueueName()); } } {code} An example is, If a queue has capacity = 1G, max_am_resource_percent = 0.2, the maximum resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be launched is 200, and if user uses 5M for each AM (> minimum_allocation). All apps can still be activated, and it will occupy all resource of a queue instead of only a max_am_resource_percent of a queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155649#comment-14155649 ] Hadoop QA commented on YARN-1414: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12632578/YARN-1221-v2.patch against trunk revision dd1b8f2. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5211//console This message is automatically generated. > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155643#comment-14155643 ] Zhijie Shen commented on YARN-2583: --- Per discussion offline: 1. In AggregatedLogDeletionService of JHS, we delete the log files of completed app, and in AppLogAggregatorImpl of NM, we delete the log files of the running LRS. We need to add a test case to verify AggregatedLogDeletionService won't delete the running LRS logs. 2. We apply the same retention policy at both sides, using the time to determine what log files need to be deleted. 3. For scalability consideration, let's keep the criteria of the number of logs per app, in case the rolling interval is small and too many configuration files are generated. But let's keep the config private to AppLogAggregatorImpl. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
[ https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155638#comment-14155638 ] Joep Rottinghuis commented on YARN-1414: @sandyr could we get some love on this jira ? We're essentially running with a forked Fairscheduler and would like to reduce tech-debt each time we uprev to a newer version. > with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs > - > > Key: YARN-1414 > URL: https://issues.apache.org/jira/browse/YARN-1414 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Assignee: Siqi Li > Fix For: 2.2.0 > > Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155624#comment-14155624 ] Steve Loughran commented on YARN-2616: -- features of 003 patch # registry instance created via factory # uses configuration instance built up on command line (though it is also creating a {{YarnConfiguration()}} around that. # pulls out all exception-to-error-text mapping to single method # covered the current set of errors # and also log @ debug if enabled. > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, > yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155584#comment-14155584 ] Jian He commented on YARN-2628: --- looks good, one minor comment in the test case: - the following assertion depends on timing, as the allocation happens asynchronously, it might fail. could you use a loop to check if the container is allocated, otherwise timeout. {code} Thread.sleep(1000); allocResponse = am1.schedule(); Assert.assertEquals(1, allocResponse.getAllocatedContainers().size()); {code} > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155571#comment-14155571 ] Karthik Kambatla commented on YARN-2254: +1, pending Jenkins. I ll commit this later today. > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
[ https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155565#comment-14155565 ] Hadoop QA commented on YARN-2630: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672374/YARN-2630.4.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5204//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5204//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5204//console This message is automatically generated. > TestDistributedShell#testDSRestartWithPreviousRunningContainers fails > - > > Key: YARN-2630 > URL: https://issues.apache.org/jira/browse/YARN-2630 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Jian He > Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, > YARN-2630.4.patch > > > The problem is that after YARN-1372, in work-preserving AM restart, the > re-launched AM will also receive previously failed AM container. But > DistributedShell logic is not expecting this extra completed container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.18.patch Rebased on trunk. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, > YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, > YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1418#comment-1418 ] zhihai xu commented on YARN-2254: - Hi [~kasha], Good suggestion, I upload a new patch YARN-2254.004.patch to address the comments. thanks > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2254: Attachment: YARN-2254.004.patch > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2616: - Attachment: YARN-2616-003.patch > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: YARN-2616-003.patch, yarn-2616-v1.patch, > yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2616) Add CLI client to the registry to list/view entries
[ https://issues.apache.org/jira/browse/YARN-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155530#comment-14155530 ] Steve Loughran commented on YARN-2616: -- the patch I just posted doesn't {{stop()}} the registry service, so will leak a curator instance/threads. > Add CLI client to the registry to list/view entries > --- > > Key: YARN-2616 > URL: https://issues.apache.org/jira/browse/YARN-2616 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Steve Loughran >Assignee: Akshay Radia > Attachments: yarn-2616-v1.patch, yarn-2616-v2.patch > > > registry needs a CLI interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-913: Attachment: YARN-913-016.patch patch -016: includes registry cli patch (-002) of YARN-2616 > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155481#comment-14155481 ] Hadoop QA commented on YARN-1879: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672394/YARN-1879.17.patch against trunk revision 875aa79. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5206//console This message is automatically generated. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, > YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, > YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155477#comment-14155477 ] Hadoop QA commented on YARN-2617: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672391/YARN-2617.5.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.TestEventFlow org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5205//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5205//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5205//console This message is automatically generated. > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155473#comment-14155473 ] Tsuyoshi OZAWA commented on YARN-2312: -- I cannot reproduce the findbugs warning. Let me check the reason on Jenkins. > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2254) change TestRMWebServicesAppsModification to support FairScheduler.
[ https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155467#comment-14155467 ] Karthik Kambatla commented on YARN-2254: Patch looks mostly good. One nit: Can we rename ALLOC_FILE to FS_ALLOC_FILE and "test-queues.xml" to "test-fs-queues.xml" to clarify the files are used only for FairScheduler? > change TestRMWebServicesAppsModification to support FairScheduler. > -- > > Key: YARN-2254 > URL: https://issues.apache.org/jira/browse/YARN-2254 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Minor > Labels: test > Attachments: YARN-2254.000.patch, YARN-2254.001.patch, > YARN-2254.002.patch, YARN-2254.003.patch > > > TestRMWebServicesAppsModification skips the test, if the scheduler is not > CapacityScheduler. > change TestRMWebServicesAppsModification to support both CapacityScheduler > and FairScheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2312: - Attachment: YARN-2312.2-3.patch > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.17.patch > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, > YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, > YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155407#comment-14155407 ] Tsuyoshi OZAWA commented on YARN-1879: -- {quote} >APIs that added trigger flag. APIs that added Idempotent/AtOnce annotation? {quote} I think ">APIs that are added trigger flag." is correct, so updating it. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155400#comment-14155400 ] Tsuyoshi OZAWA commented on YARN-1879: -- About the release audit warning, it's also not related. {quote} !? /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-hdfs-project/hadoop-hdfs/.gitattributes Lines that start with ? in the release audit report indicate files that do not have an Apache license header {quote} > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155398#comment-14155398 ] Varun Vasudev commented on YARN-2628: - The release audit error is from a hdfs file and unrelated. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155399#comment-14155399 ] Tsuyoshi OZAWA commented on YARN-1879: -- Sorry for the delay and thanks for updating the patch, [~adhoot]. About the test failure, it looks not related to the patch. Let me attach the patch which includes your comment changes. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, > YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, > YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2617) NM does not need to send finished container whose APP is not running to RM
[ https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2617: -- Attachment: YARN-2617.5.patch just added one more log statement myself, pending jenkins > NM does not need to send finished container whose APP is not running to RM > -- > > Key: YARN-2617 > URL: https://issues.apache.org/jira/browse/YARN-2617 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Fix For: 2.6.0 > > Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, > YARN-2617.5.patch, YARN-2617.patch > > > We([~chenchun]) are testing RM work preserving restart and found the > following logs when we ran a simple MapReduce task "PI". NM continuously > reported completed containers whose Application had already finished while AM > had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In the patch for YARN-1372, ApplicationImpl on NM should guarantee to clean > up already completed applications. But it will only remove appId from > 'app.context.getApplications()' when ApplicaitonImpl received evnet > 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might > receive this event for a long time or could not receive. > * For NonAggregatingLogHandler, it wait for > YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, > then it will be scheduled to delete Application logs and send the event. > * For LogAggregationService, it might fail(e.g. if user does not have HDFS > write permission), and it will not send the event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free
[ https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155393#comment-14155393 ] Hadoop QA commented on YARN-2628: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672381/apache-yarn-2628.0.patch against trunk revision 1f5b42a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5202//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5202//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5202//console This message is automatically generated. > Capacity scheduler with DominantResourceCalculator carries out reservation > even though slots are free > - > > Key: YARN-2628 > URL: https://issues.apache.org/jira/browse/YARN-2628 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.5.1 >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-2628.0.patch > > > We've noticed that if you run the CapacityScheduler with the > DominantResourceCalculator, sometimes apps will end up with containers in a > reserved state even though free slots are available. > The root cause seems to be this piece of code from CapacityScheduler.java - > {noformat} > // Try to schedule more if there are no reservations to fulfill > if (node.getReservedContainer() == null) { > if (Resources.greaterThanOrEqual(calculator, getClusterResource(), > node.getAvailableResource(), minimumAllocation)) { > if (LOG.isDebugEnabled()) { > LOG.debug("Trying to schedule on node: " + node.getNodeName() + > ", available: " + node.getAvailableResource()); > } > root.assignContainers(clusterResource, node, false); > } > } else { > LOG.info("Skipping scheduling since node " + node.getNodeID() + > " is reserved by application " + > > node.getReservedContainer().getContainerId().getApplicationAttemptId() > ); > } > {noformat} > The code is meant to check if a node has any slots available for containers . > Since it uses the greaterThanOrEqual function, we end up in situation where > greaterThanOrEqual returns true, even though we may not have enough CPU or > memory to actually run the container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)