[jira] [Commented] (YARN-178) Fix custom ProcessTree instance creation
[ https://issues.apache.org/jira/browse/YARN-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482247#comment-13482247 ] Bikas Saha commented on YARN-178: - Then why add it to the constructor of the abstract base class? Fix custom ProcessTree instance creation Key: YARN-178 URL: https://issues.apache.org/jira/browse/YARN-178 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.5 Reporter: Radim Kolar Assignee: Radim Kolar Priority: Critical Attachments: pstree-instance2.txt, pstree-instance.txt 1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable. 2. pstree do not extend Configured as it should Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-174) TestNodeStatusUpdater is failing in trunk
[ https://issues.apache.org/jira/browse/YARN-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482251#comment-13482251 ] Hudson commented on YARN-174: - Integrated in Hadoop-Yarn-trunk #12 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/12/]) YARN-174. Modify NodeManager to pass the user's configuration even when rebooting. Contributed by Vinod Kumar Vavilapalli. (Revision 1401086) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401086 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java TestNodeStatusUpdater is failing in trunk - Key: YARN-174 URL: https://issues.apache.org/jira/browse/YARN-174 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Robert Joseph Evans Assignee: Vinod Kumar Vavilapalli Fix For: 2.0.2-alpha, 0.23.5 Attachments: YARN-174-20121022.txt, YARN-174.patch {noformat} 2012-10-19 12:18:23,941 FATAL [Node Status Updater] nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(277)) - Error starting NodeManager org.apache.hadoop.yarn.YarnException: ${yarn.log.dir}/userlogs is not a valid path. Path should be with file scheme or without scheme at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.validatePaths(LocalDirsHandlerService.java:321) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.init(LocalDirsHandlerService.java:95) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.init(LocalDirsHandlerService.java:123) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.init(NodeHealthCheckerService.java:48) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:165) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:274) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stateChanged(NodeManager.java:256) at org.apache.hadoop.yarn.service.AbstractService.changeState(AbstractService.java:163) at org.apache.hadoop.yarn.service.AbstractService.stop(AbstractService.java:112) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.stop(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.reboot(NodeStatusUpdaterImpl.java:157) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$900(NodeStatusUpdaterImpl.java:63) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:357) {noformat} The NM then calls System.exit(-1), which makes the unit test exit and produces an error that is hard to track down. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-174) TestNodeStatusUpdater is failing in trunk
[ https://issues.apache.org/jira/browse/YARN-174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482336#comment-13482336 ] Hudson commented on YARN-174: - Integrated in Hadoop-Mapreduce-trunk #1234 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1234/]) YARN-174. Modify NodeManager to pass the user's configuration even when rebooting. Contributed by Vinod Kumar Vavilapalli. (Revision 1401086) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401086 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java TestNodeStatusUpdater is failing in trunk - Key: YARN-174 URL: https://issues.apache.org/jira/browse/YARN-174 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0 Reporter: Robert Joseph Evans Assignee: Vinod Kumar Vavilapalli Fix For: 2.0.2-alpha, 0.23.5 Attachments: YARN-174-20121022.txt, YARN-174.patch {noformat} 2012-10-19 12:18:23,941 FATAL [Node Status Updater] nodemanager.NodeManager (NodeManager.java:initAndStartNodeManager(277)) - Error starting NodeManager org.apache.hadoop.yarn.YarnException: ${yarn.log.dir}/userlogs is not a valid path. Path should be with file scheme or without scheme at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.validatePaths(LocalDirsHandlerService.java:321) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.init(LocalDirsHandlerService.java:95) at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.init(LocalDirsHandlerService.java:123) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.init(NodeHealthCheckerService.java:48) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:165) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:274) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stateChanged(NodeManager.java:256) at org.apache.hadoop.yarn.service.AbstractService.changeState(AbstractService.java:163) at org.apache.hadoop.yarn.service.AbstractService.stop(AbstractService.java:112) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.stop(NodeStatusUpdaterImpl.java:149) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.reboot(NodeStatusUpdaterImpl.java:157) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$900(NodeStatusUpdaterImpl.java:63) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:357) {noformat} The NM then calls System.exit(-1), which makes the unit test exit and produces an error that is hard to track down. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results
[ https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482353#comment-13482353 ] Hadoop QA commented on YARN-177: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550459/YARN-177.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/117//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/117//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/117//console This message is automatically generated. CapacityScheduler - adding a queue while the RM is running has wacky results Key: YARN-177 URL: https://issues.apache.org/jira/browse/YARN-177 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-177.patch, YARN-177.patch Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount. Looking at the RM logs, used memory can go negative but other logs show the number positive: 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results
[ https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482397#comment-13482397 ] Thomas Graves commented on YARN-177: The LeafQueue has a setParentQueue that is unused and can be removed now. CapacityScheduler - adding a queue while the RM is running has wacky results Key: YARN-177 URL: https://issues.apache.org/jira/browse/YARN-177 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-177.patch, YARN-177.patch Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount. Looking at the RM logs, used memory can go negative but other logs show the number positive: 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-178) Fix custom ProcessTree instance creation
[ https://issues.apache.org/jira/browse/YARN-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482491#comment-13482491 ] Robert Joseph Evans commented on YARN-178: -- Makes since. Bikas, unless you have a strong objection I will check this in this afternoon. Fix custom ProcessTree instance creation Key: YARN-178 URL: https://issues.apache.org/jira/browse/YARN-178 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.5 Reporter: Radim Kolar Assignee: Radim Kolar Priority: Critical Attachments: pstree-instance2.txt, pstree-instance.txt 1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable. 2. pstree do not extend Configured as it should Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results
[ https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482518#comment-13482518 ] Hadoop QA commented on YARN-177: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550499/YARN-177.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/118//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/118//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/118//console This message is automatically generated. CapacityScheduler - adding a queue while the RM is running has wacky results Key: YARN-177 URL: https://issues.apache.org/jira/browse/YARN-177 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-177.patch, YARN-177.patch, YARN-177.patch Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount. Looking at the RM logs, used memory can go negative but other logs show the number positive: 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-177) CapacityScheduler - adding a queue while the RM is running has wacky results
[ https://issues.apache.org/jira/browse/YARN-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482563#comment-13482563 ] Hadoop QA commented on YARN-177: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550506/YARN-177.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/119//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/119//console This message is automatically generated. CapacityScheduler - adding a queue while the RM is running has wacky results Key: YARN-177 URL: https://issues.apache.org/jira/browse/YARN-177 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-177.patch, YARN-177.patch, YARN-177.patch, YARN-177.patch Adding a queue to the capacity scheduler while the RM is running and then running a job in the queue added results in very strange behavior. The cluster Total Memory can either decrease or increase. We had a cluster where total memory decreased to almost 1/6th the capacity. Running on a small test cluster resulted in the capacity going up by simply adding a queue and running wordcount. Looking at the RM logs, used memory can go negative but other logs show the number positive: 2012-10-21 22:56:44,796 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0375 absoluteUsedCapacity=0.0375 used=memory: 7680 cluster=memory: 204800 2012-10-21 22:56:45,831 [ResourceManager Event Processor] INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=-0.0225 absoluteUsedCapacity=-0.0225 used=memory: -4608 cluster=memory: 204800 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482589#comment-13482589 ] Robert Joseph Evans commented on YARN-180: -- The patch looks mostly good. I am a bit confused by {code}if (containerToken == null) { containerToken = null; // Try again later. } {code} inside the new createContainerToken method. It is a copy and paste from before, but not needed any more. Other then that it looks good. Since Arun is on a plane now I will upload a new patch. Capacity scheduler - containers that get reserved create container token to early - Key: YARN-180 URL: https://issues.apache.org/jira/browse/YARN-180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-180.patch, YARN-180.patch The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482591#comment-13482591 ] Robert Joseph Evans commented on YARN-180: -- Oh I noticed that the containerToken is never assigned anyways. I will fix that too. Capacity scheduler - containers that get reserved create container token to early - Key: YARN-180 URL: https://issues.apache.org/jira/browse/YARN-180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-180.patch, YARN-180.patch The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-180) Capacity scheduler - containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated YARN-180: - Attachment: YARN-180.patch Capacity scheduler - containers that get reserved create container token to early - Key: YARN-180 URL: https://issues.apache.org/jira/browse/YARN-180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-180.patch, YARN-180.patch, YARN-180.patch The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482611#comment-13482611 ] Hadoop QA commented on YARN-180: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550518/YARN-180.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/121//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/121//console This message is automatically generated. Capacity scheduler - containers that get reserved create container token to early - Key: YARN-180 URL: https://issues.apache.org/jira/browse/YARN-180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-180.patch, YARN-180.patch, YARN-180.patch The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-183) Clean up fair scheduler code
Sandy Ryza created YARN-183: --- Summary: Clean up fair scheduler code Key: YARN-183 URL: https://issues.apache.org/jira/browse/YARN-183 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Priority: Minor The fair scheduler code has a bunch of minor stylistic issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-167) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482716#comment-13482716 ] Robert Joseph Evans commented on YARN-167: -- I am rather nervous about back porting MAPREDUCE-3353. It is a major feature that has a significant footprint and was not all that stable when it first went in. I know that it has since stabilized but I am still nervous about such a large change. It seems like it would be simpler to handle the KILL events in the states that missed it. AM stuck in KILL_WAIT for days -- Key: YARN-167 URL: https://issues.apache.org/jira/browse/YARN-167 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-147) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/YARN-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482751#comment-13482751 ] Vinod Kumar Vavilapalli commented on YARN-147: -- Got lost between YARN-3 and YARN-147 :) Very nice to have patch, I am willing to get it in ASAP given the amount of time it's been around. Some comments below. - yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml - If the configs can be done away with (see below), ignore this comment. The descriptions for all the new configs in yarn-default.xml heavily reference code. We should simplify them to not address code and instead make them understandable by users and cross reference other related parameters. - {code}// Based on testing, ApplicationMaster executables don't terminate until // a little after the container appears to have finished. Therefore, we // wait a short bit for the cgroup to become empty before deleting it. {code} Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one. - container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed. - Please update if you have tested it on a secure setup with LCE enabled with and without cgroups. The following are already raised by others in some way, but I don't see them fixed in the latest patch. Unless I am missing something: - Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix. - CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error? One overarching improvement worth pursing immediately, either now or in follow up tickets: - Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality. Add support for CPU isolation/monitoring of containers -- Key: YARN-147 URL: https://issues.apache.org/jira/browse/YARN-147 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Andrew Ferguson Fix For: 2.0.3-alpha Attachments: YARN-147-v1.patch, YARN-147-v2.patch, YARN-147-v3.patch, YARN-147-v4.patch, YARN-147-v5.patch, YARN-3.patch This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-181) capacity-scheduler.xml move breaks Eclipse import
[ https://issues.apache.org/jira/browse/YARN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-181: - Summary: capacity-scheduler.xml move breaks Eclipse import (was: capacity-scheduler.cfg move breaks Eclipse import) I am sure you meant capacity-scheduler.xml capacity-scheduler.xml move breaks Eclipse import - Key: YARN-181 URL: https://issues.apache.org/jira/browse/YARN-181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Attachments: YARN181_jenkins.txt, YARN181_postSvnMv.txt, YARN181_svn_mv.sh Eclipse doesn't seem to handle testResources which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-184) Remove unnecessary locking in fair scheduler, and address findbugs.
Sandy Ryza created YARN-184: --- Summary: Remove unnecessary locking in fair scheduler, and address findbugs. Key: YARN-184 URL: https://issues.apache.org/jira/browse/YARN-184 Project: Hadoop YARN Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza In YARN-12, locks were added to all fields of QueueManager to address findbugs. In addition, findbugs exclusions were added in response to MAPREDUCE-4439, without a deep look at the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-181) capacity-scheduler.xml move breaks Eclipse import
[ https://issues.apache.org/jira/browse/YARN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482775#comment-13482775 ] Hudson commented on YARN-181: - Integrated in Hadoop-trunk-Commit #2919 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2919/]) YARN-181. Fixed eclipse settings broken by capacity-scheduler.xml move via YARN-140. Contributed by Siddharth Seth. (Revision 1401504) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401504 Files : * /hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml capacity-scheduler.xml move breaks Eclipse import - Key: YARN-181 URL: https://issues.apache.org/jira/browse/YARN-181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Fix For: 2.0.3-alpha Attachments: YARN181_jenkins.txt, YARN181_postSvnMv.txt, YARN181_svn_mv.sh Eclipse doesn't seem to handle testResources which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-167) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482778#comment-13482778 ] Vinod Kumar Vavilapalli commented on YARN-167: -- bq. Afterwards, the Task Attempt transitions from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED. In either of these states TA_KILL is ignored. So the Task stays in KILL_WAIT and consequently the Job too. This is fine. Job waits for all tasks and taskAttempts to 'finish', not just killed. In this case, TA will succeed and inform the job about the same, so that the job doesn't wait for this task anymore. bq. I am rather nervous about back porting MAPREDUCE-3353. It is a major feature that has a significant footprint and was not all that stable when it first went in. I know that it has since stabilized but I am still nervous about such a large change. Understand that it is a big change, but if we want to address this issue, we need that patch. Given MAPREDUCE-3353 is hardened on trunk, we should considering pulling it in into 0.23. bq. It seems like it would be simpler to handle the KILL events in the states that missed it. There isn't anything like a missed state that is causing this issue if I understand Ravi's issue description correctly. AM stuck in KILL_WAIT for days -- Key: YARN-167 URL: https://issues.apache.org/jira/browse/YARN-167 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-182) Container killed by the ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-182: - Attachment: Log.txt Attaching log. Container killed by the ApplicationMaster - Key: YARN-182 URL: https://issues.apache.org/jira/browse/YARN-182 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.1-alpha Reporter: zhengqiu cai Labels: hadoop Attachments: Log.txt I was running wordcount and the resourcemanager web UI shown the status as FINISHED SUCCEEDED, but the log shown Container killed by the ApplicationMaster -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-182) Unnecessary Container killed by the ApplicationMaster message for successful containers
[ https://issues.apache.org/jira/browse/YARN-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-182: - Summary: Unnecessary Container killed by the ApplicationMaster message for successful containers (was: Container killed by the ApplicationMaster) Unnecessary Container killed by the ApplicationMaster message for successful containers - Key: YARN-182 URL: https://issues.apache.org/jira/browse/YARN-182 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.1-alpha Reporter: zhengqiu cai Labels: hadoop Attachments: Log.txt I was running wordcount and the resourcemanager web UI shown the status as FINISHED SUCCEEDED, but the log shown Container killed by the ApplicationMaster -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-181) capacity-scheduler.xml move breaks Eclipse import
[ https://issues.apache.org/jira/browse/YARN-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482811#comment-13482811 ] Hudson commented on YARN-181: - Integrated in Hadoop-Yarn-trunk #13 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/13/]) YARN-181. Fixed eclipse settings broken by capacity-scheduler.xml move via YARN-140. Contributed by Siddharth Seth. (Revision 1401504) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401504 Files : * /hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml capacity-scheduler.xml move breaks Eclipse import - Key: YARN-181 URL: https://issues.apache.org/jira/browse/YARN-181 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.2-alpha Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Critical Fix For: 2.0.3-alpha Attachments: YARN181_jenkins.txt, YARN181_postSvnMv.txt, YARN181_svn_mv.sh Eclipse doesn't seem to handle testResources which resolve to an absolute path. YARN-140 moved capacity-scheduler.cfg a couple of levels up to the hadoop-yarn project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-179) Bunch of test failures on trunk
[ https://issues.apache.org/jira/browse/YARN-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482814#comment-13482814 ] Hudson commented on YARN-179: - Integrated in Hadoop-Yarn-trunk #13 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/13/]) YARN-179. Fix some unit test failures. (Contributed by Vinod Kumar Vavilapalli) (Revision 1401481) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1401481 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/main/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/UnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java Bunch of test failures on trunk --- Key: YARN-179 URL: https://issues.apache.org/jira/browse/YARN-179 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.2-alpha Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 2.0.3-alpha Attachments: YARN-179-20121022.3.txt, YARN-179-20121022.4.txt {{CapacityScheduler.setConf()}} mandates a YarnConfiguration. It doesn't need to, throughout all of YARN, components only depend on Configuration and depend on the callers to provide correct configuration. This is causing multiple tests to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-184) Remove unnecessary locking in fair scheduler, and address findbugs excludes.
[ https://issues.apache.org/jira/browse/YARN-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-184: Summary: Remove unnecessary locking in fair scheduler, and address findbugs excludes. (was: Remove unnecessary locking in fair scheduler, and address findbugs.) Remove unnecessary locking in fair scheduler, and address findbugs excludes. Key: YARN-184 URL: https://issues.apache.org/jira/browse/YARN-184 Project: Hadoop YARN Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza In YARN-12, locks were added to all fields of QueueManager to address findbugs. In addition, findbugs exclusions were added in response to MAPREDUCE-4439, without a deep look at the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-167) AM stuck in KILL_WAIT for days
[ https://issues.apache.org/jira/browse/YARN-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482853#comment-13482853 ] Vinod Kumar Vavilapalli commented on YARN-167: -- bq. There isn't anything like a missed state that is causing this issue if I understand Ravi's issue description correctly. Obviously, this could be wrong. Ravi, if you have one of these stuck AMs lying around, can you take a thread dump please? AM stuck in KILL_WAIT for days -- Key: YARN-167 URL: https://issues.apache.org/jira/browse/YARN-167 Project: Hadoop YARN Issue Type: Bug Affects Versions: 0.23.3 Reporter: Ravi Prakash Assignee: Vinod Kumar Vavilapalli Attachments: TaskAttemptStateGraph.jpg We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-139) Interrupted Exception within AsyncDispatcher leads to user confusion
[ https://issues.apache.org/jira/browse/YARN-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-139: - Attachment: YARN-139-20121023.txt Thanks Jason, was thinking of fixing that separately but here it goes. Interrupted Exception within AsyncDispatcher leads to user confusion Key: YARN-139 URL: https://issues.apache.org/jira/browse/YARN-139 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.2-alpha, 0.23.4 Reporter: Nathan Roberts Assignee: Vinod Kumar Vavilapalli Attachments: YARN-139-20121019.1.txt, YARN-139-20121019.txt, YARN-139-20121023.txt, YARN-139.txt Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up. 2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1143) at java.lang.Thread.join(Thread.java:1196) at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-139) Interrupted Exception within AsyncDispatcher leads to user confusion
[ https://issues.apache.org/jira/browse/YARN-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482876#comment-13482876 ] Hadoop QA commented on YARN-139: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550561/YARN-139-20121023.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/123//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/123//console This message is automatically generated. Interrupted Exception within AsyncDispatcher leads to user confusion Key: YARN-139 URL: https://issues.apache.org/jira/browse/YARN-139 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.0.2-alpha, 0.23.4 Reporter: Nathan Roberts Assignee: Vinod Kumar Vavilapalli Attachments: YARN-139-20121019.1.txt, YARN-139-20121019.txt, YARN-139-20121023.txt, YARN-139.txt Successful applications tend to get InterruptedExceptions during shutdown. The exception is harmless but it leads to lots of user confusion and therefore could be cleaned up. 2012-09-28 14:50:12,477 WARN [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Interrupted Exception while stopping java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1143) at java.lang.Thread.join(Thread.java:1196) at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:105) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:437) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:402) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:619) 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:Dispatcher is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.service.AbstractService: Service:org.apache.hadoop.mapreduce.v2.app.MRAppMaster is stopped. 2012-09-28 14:50:12,477 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Exiting MR AppMaster..GoodBye -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-180) Capacity scheduler - containers that get reserved create container token to early
[ https://issues.apache.org/jira/browse/YARN-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482887#comment-13482887 ] Thomas Graves commented on YARN-180: +1 for latest patch. I manually tested this on a small cluster and verified that a container can be reserved for 10 minutes and the AM can still start the container after finally being allocated it. Capacity scheduler - containers that get reserved create container token to early - Key: YARN-180 URL: https://issues.apache.org/jira/browse/YARN-180 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 0.23.3 Reporter: Thomas Graves Assignee: Arun C Murthy Priority: Critical Fix For: 2.0.3-alpha, 0.23.5 Attachments: YARN-180-branch_0.23.patch, YARN-180.patch, YARN-180.patch, YARN-180.patch The capacity scheduler has the ability to 'reserve' containers. Unfortunately before it decides that it goes to reserved rather then assigned, the Container object is created which creates a container token that expires in roughly 10 minutes by default. This means that by the time the NM frees up enough space on that node for the container to move to assigned the container token may have expired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-178) Fix custom ProcessTree instance creation
[ https://issues.apache.org/jira/browse/YARN-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482917#comment-13482917 ] Bikas Saha commented on YARN-178: - Sure. Please go ahead. Fix custom ProcessTree instance creation Key: YARN-178 URL: https://issues.apache.org/jira/browse/YARN-178 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.5 Reporter: Radim Kolar Assignee: Radim Kolar Priority: Critical Attachments: pstree-instance2.txt, pstree-instance.txt 1. In current pluggable resourcecalculatorprocesstree is not passed root process id to custom implementation making it unusable. 2. pstree do not extend Configured as it should Added constructor with pid argument with testsuite. Also added test that pstree is correctly configured. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira