[jira] [Commented] (YARN-879) Fix NPE in test/o.a.h.y.server.resourcemanager.Application.getResources()
[ https://issues.apache.org/jira/browse/YARN-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705501#comment-13705501 ] Junping Du commented on YARN-879: - Sure. Vinod. I will try to make tests work. Thanks for sharing background here. :) Fix NPE in test/o.a.h.y.server.resourcemanager.Application.getResources() - Key: YARN-879 URL: https://issues.apache.org/jira/browse/YARN-879 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Junping Du Assignee: Junping Du Attachments: YARN-879.patch getResources() will return a list of containers that allocated by RM. However, it is now return null directly. The worse thing is: if LOG.debug is enabled, then it will definitely cause NPE exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-347) YARN node CLI should also show CPU info as memory info in node status
[ https://issues.apache.org/jira/browse/YARN-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705512#comment-13705512 ] Junping Du commented on YARN-347: - Hi, [~acmurthy] The patch is rebase to latest trunk. Would you help to review it again? Thx! YARN node CLI should also show CPU info as memory info in node status - Key: YARN-347 URL: https://issues.apache.org/jira/browse/YARN-347 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Junping Du Assignee: Junping Du Attachments: YARN-347.patch, YARN-347-v2.patch With YARN-2 checked in, CPU info are taken into consideration in resource scheduling. yarn node -status NodeID should show CPU used and capacity info as memory info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705515#comment-13705515 ] Xuan Gong commented on YARN-763: Oh, You are right. If we want to let CallBackThread to call asyncClient.stop(), we might need to add this part of code inside the CallBackThread.run(). In that case, we may need to create a new test class, such as mockAMRMClientAsync, and re-write CallBackThread.run(). Any other ideas ? AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705556#comment-13705556 ] Bikas Saha commented on YARN-763: - We can simply create another version of the TestCallbackHandler that calls asyncClient.stop() when asyncClient calls its getProgress() method. After the stop() has completed the method can set a flag and notifyAll(this). The main test thread can wait() on the handler object and check that the flag is set when it gets notified. Else it waits again. This way if the callback thread is deadlocked then test thread will not exit and the test will fail with timeout. To verify, the test should fail with join() and pass without it. Similar logic is used in other tests. Please lets not sleep(1000) as this just slows down the testing. Lets sleep(50) and set the heartbeat interval to 10. This allows for 5 heartbeats and so the verification that actual heartbeat count == 1 is accurate. AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric
[ https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705694#comment-13705694 ] Hudson commented on YARN-736: - Integrated in Hadoop-Yarn-trunk #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/267/]) updating CHANGES.txt after committing MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883 to 2.1-beta branch (Revision 1502075) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add a multi-resource fair sharing metric Key: YARN-736 URL: https://issues.apache.org/jira/browse/YARN-736 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, YARN-736-4.patch, YARN-736.patch Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions. With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense. It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output
[ https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705702#comment-13705702 ] Hudson commented on YARN-368: - Integrated in Hadoop-Yarn-trunk #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/267/]) YARN-368. Fixed a typo in error message in Auxiliary services. Contributed by Albert Chu. (Revision 1501852) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java Fix typo defiend should be defined in error output -- Key: YARN-368 URL: https://issues.apache.org/jira/browse/YARN-368 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Albert Chu Assignee: Albert Chu Priority: Trivial Fix For: 2.1.1-beta Attachments: YARN-368.patch Noticed the following in an error log output while doing some experiements ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle defiend should be defined -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705699#comment-13705699 ] Hudson commented on YARN-569: - Integrated in Hadoop-Yarn-trunk #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/267/]) YARN-569. Add support for requesting and enforcing preemption requests via a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 1502083) Result = SUCCESS cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project:
[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
[ https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705703#comment-13705703 ] Hudson commented on YARN-295: - Integrated in Hadoop-Yarn-trunk #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/267/]) YARN-295. Fixed a race condition in ResourceManager RMAppAttempt state machine. Contributed by Mayank Bansal. (Revision 1501856) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501856 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl --- Key: YARN-295 URL: https://issues.apache.org/jira/browse/YARN-295 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Fix For: 2.1.1-beta Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, YARN-295-trunk-3.patch {code:xml} 2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-866) Add test for class ResourceWeights
[ https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705705#comment-13705705 ] Hudson commented on YARN-866: - Integrated in Hadoop-Yarn-trunk #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/267/]) updating CHANGES.txt after committing MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883 to 2.1-beta branch (Revision 1502075) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add test for class ResourceWeights -- Key: YARN-866 URL: https://issues.apache.org/jira/browse/YARN-866 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.1.0-beta Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.1.0-beta Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch Add test case for the class ResourceWeights -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-883) Expose Fair Scheduler-specific queue metrics
[ https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705691#comment-13705691 ] Hudson commented on YARN-883: - Integrated in Hadoop-Yarn-trunk #267 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/267/]) updating CHANGES.txt after committing MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883 to 2.1-beta branch (Revision 1502075) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Expose Fair Scheduler-specific queue metrics Key: YARN-883 URL: https://issues.apache.org/jira/browse/YARN-883 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-883-1.patch, YARN-883-1.patch, YARN-883.patch When the Fair Scheduler is enabled, QueueMetrics should include fair share, minimum share, and maximum share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-866) Add test for class ResourceWeights
[ https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705807#comment-13705807 ] Hudson commented on YARN-866: - Integrated in Hadoop-Hdfs-trunk #1457 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/]) updating CHANGES.txt after committing MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883 to 2.1-beta branch (Revision 1502075) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add test for class ResourceWeights -- Key: YARN-866 URL: https://issues.apache.org/jira/browse/YARN-866 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.1.0-beta Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.1.0-beta Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch Add test case for the class ResourceWeights -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705801#comment-13705801 ] Hudson commented on YARN-569: - Integrated in Hadoop-Hdfs-trunk #1457 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/]) YARN-569. Add support for requesting and enforcing preemption requests via a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 1502083) Result = FAILURE cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569 Project:
[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output
[ https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705804#comment-13705804 ] Hudson commented on YARN-368: - Integrated in Hadoop-Hdfs-trunk #1457 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/]) YARN-368. Fixed a typo in error message in Auxiliary services. Contributed by Albert Chu. (Revision 1501852) Result = FAILURE vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java Fix typo defiend should be defined in error output -- Key: YARN-368 URL: https://issues.apache.org/jira/browse/YARN-368 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Albert Chu Assignee: Albert Chu Priority: Trivial Fix For: 2.1.1-beta Attachments: YARN-368.patch Noticed the following in an error log output while doing some experiements ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle defiend should be defined -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric
[ https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705794#comment-13705794 ] Hudson commented on YARN-736: - Integrated in Hadoop-Hdfs-trunk #1457 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/]) updating CHANGES.txt after committing MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883 to 2.1-beta branch (Revision 1502075) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add a multi-resource fair sharing metric Key: YARN-736 URL: https://issues.apache.org/jira/browse/YARN-736 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.1.0-beta Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, YARN-736-4.patch, YARN-736.patch Currently, at a regular interval, the fair scheduler computes a fair memory share for each queue and application inside it. This fair share is not used for scheduling decisions, but is displayed in the web UI, exposed as a metric, and used for preemption decisions. With DRF and multi-resource scheduling, assigning a memory share as the fair share metric to every queue no longer makes sense. It's not obvious what the replacement should be, but probably something like fractional fairness within a queue, or distance from an ideal cluster state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output
[ https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705863#comment-13705863 ] Hudson commented on YARN-368: - Integrated in Hadoop-Mapreduce-trunk #1484 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/]) YARN-368. Fixed a typo in error message in Auxiliary services. Contributed by Albert Chu. (Revision 1501852) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java Fix typo defiend should be defined in error output -- Key: YARN-368 URL: https://issues.apache.org/jira/browse/YARN-368 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Albert Chu Assignee: Albert Chu Priority: Trivial Fix For: 2.1.1-beta Attachments: YARN-368.patch Noticed the following in an error log output while doing some experiements ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException: No class defiend for uda.shuffle defiend should be defined -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)
[ https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705860#comment-13705860 ] Hudson commented on YARN-569: - Integrated in Hadoop-Mapreduce-trunk #1484 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/]) YARN-569. Add support for requesting and enforcing preemption requests via a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 1502083) Result = SUCCESS cdouglas : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java CapacityScheduler: support for preemption (using a capacity monitor) Key: YARN-569 URL: https://issues.apache.org/jira/browse/YARN-569
[jira] [Commented] (YARN-866) Add test for class ResourceWeights
[ https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705866#comment-13705866 ] Hudson commented on YARN-866: - Integrated in Hadoop-Mapreduce-trunk #1484 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/]) updating CHANGES.txt after committing MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883 to 2.1-beta branch (Revision 1502075) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt Add test for class ResourceWeights -- Key: YARN-866 URL: https://issues.apache.org/jira/browse/YARN-866 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.1.0-beta Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.1.0-beta Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch Add test case for the class ResourceWeights -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
[ https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705864#comment-13705864 ] Hudson commented on YARN-295: - Integrated in Hadoop-Mapreduce-trunk #1484 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/]) YARN-295. Fixed a race condition in ResourceManager RMAppAttempt state machine. Contributed by Mayank Bansal. (Revision 1501856) Result = SUCCESS vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501856 Files : * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl --- Key: YARN-295 URL: https://issues.apache.org/jira/browse/YARN-295 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Fix For: 2.1.1-beta Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, YARN-295-trunk-3.patch {code:xml} 2012-12-28 14:03:56,956 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell
[ https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705929#comment-13705929 ] Abhishek Kapoor commented on YARN-816: -- Please correct me if I am wrong. Are you suggesting a use case where job if fails will start from where it dies ? If yes, then i think we need to maintain a sate of user application running on container allocated. Isn't it a user application's responsibility to figure it out whether its a fresh start of app or a recovery ? Implement AM recovery for distributed shell --- Key: YARN-816 URL: https://issues.apache.org/jira/browse/YARN-816 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Simple recovery to just continue from where it left off is a good start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-815) Add container failure handling to distributed-shell
[ https://issues.apache.org/jira/browse/YARN-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kapoor reassigned YARN-815: Assignee: Abhishek Kapoor Add container failure handling to distributed-shell --- Key: YARN-815 URL: https://issues.apache.org/jira/browse/YARN-815 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Abhishek Kapoor Today if any container fails because of whatever reason, the app simply ignores them. We should handle retries, improve error reporting etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-865) RM webservices can't query on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-865: --- Attachment: YARN-865.3.patch RM webservices can't query on application Types --- Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, YARN-865.3.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705956#comment-13705956 ] Xuan Gong commented on YARN-865: Yes, those logic should move out of the loop. RM webservices can't query on application Types --- Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, YARN-865.3.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705973#comment-13705973 ] Hadoop QA commented on YARN-865: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591874/YARN-865.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1458//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1458//console This message is automatically generated. RM webservices can't query on application Types --- Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, YARN-865.3.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-763: --- Attachment: YARN-763.8.patch AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, YARN-763.8.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM
[ https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706035#comment-13706035 ] Hadoop QA commented on YARN-763: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591879/YARN-763.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1459//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1459//console This message is automatically generated. AMRMClientAsync should stop heartbeating after receiving shutdown from RM - Key: YARN-763 URL: https://issues.apache.org/jira/browse/YARN-763 Project: Hadoop YARN Issue Type: Bug Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, YARN-763.8.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell
[ https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706050#comment-13706050 ] Omkar Vinit Joshi commented on YARN-816: I think this is similar to preemption case... If application supports checkpointing then we can start from where it left of.. if not then start from scratch.. Implement AM recovery for distributed shell --- Key: YARN-816 URL: https://issues.apache.org/jira/browse/YARN-816 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Simple recovery to just continue from where it left off is a good start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-292: Assignee: Zhijie Shen ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Key: YARN-292 URL: https://issues.apache.org/jira/browse/YARN-292 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Devaraj K Assignee: Zhijie Shen {code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706108#comment-13706108 ] Zhijie Shen commented on YARN-292: -- Will look into this problem ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Key: YARN-292 URL: https://issues.apache.org/jira/browse/YARN-292 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Devaraj K Assignee: Zhijie Shen {code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706109#comment-13706109 ] Omkar Vinit Joshi commented on YARN-897: [~dedcode] / [~curino] you want to work on the patch or can I take over? seems like an important bug which needs to be fixed. I looked at the code and on container completion it is not resorting the TreeSet which will result into unfairness.. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users
[ https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-661: --- Attachment: YARN-661-20130711.1.patch NM fails to cleanup local directories for users --- Key: YARN-661 URL: https://issues.apache.org/jira/browse/YARN-661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 0.23.8 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, YARN-661-20130710.1.patch, YARN-661-20130711.1.patch YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706126#comment-13706126 ] Zhijie Shen commented on YARN-865: -- +1 for the latest patch RM webservices can't query on application Types --- Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, YARN-865.3.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users
[ https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706137#comment-13706137 ] Hadoop QA commented on YARN-661: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591891/YARN-661-20130711.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1460//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1460//console This message is automatically generated. NM fails to cleanup local directories for users --- Key: YARN-661 URL: https://issues.apache.org/jira/browse/YARN-661 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.0-beta, 0.23.8 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, YARN-661-20130710.1.patch, YARN-661-20130711.1.patch YARN-71 added deletion of local directories on startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user's usercache directory, as the user, fails due to lack of permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706152#comment-13706152 ] Carlo Curino commented on YARN-897: --- I agree this need fixing soon. We have a first draft of the patch, we were planning to test it out carefully before posting it, but if you have cycles we can socialize it right-away and we can work on it together. [~dedcode] please post the patch in its current state. [~ojoshi] you can check it out and we can test/verify in the meantime. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-1.patch Attached is a first patch attempt to address the bug: Upon container completion, which triggers completedContainer(), remove and reinsert the queue into its parent's childQueues. This operation is done recursively starting from the leafQueue where the container got released. Thus, by handling both cases where usedCapacity is ever changed (assignement and completion) the TreeSet remains properly sorted. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-245: --- Attachment: YARN-245-trunk-2.patch Thanks [~ojoshi] and [~vinodkv] for the review. Updated the patch. Thanks, Mayank Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706192#comment-13706192 ] Mayank Bansal commented on YARN-299: Sure [~vinodkv]. I am reopening YARN-820 and closing this one. Thanks, Mayank Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE
[ https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal resolved YARN-299. Resolution: Cannot Reproduce Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE --- Key: YARN-299 URL: https://issues.apache.org/jira/browse/YARN-299 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.0-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch {code:xml} 2012-12-31 10:36:27,844 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Can't handle this event at current state: Current: [DONE], eventType: [RESOURCE_FAILED] org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-12-31 10:36:27,845 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1356792558130_0002_01_01 transitioned from DONE to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706197#comment-13706197 ] Omkar Vinit Joshi commented on YARN-744: [~bikassaha] sounds reasonable ..will take a look at it again. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-820) NodeManager has invalid state transition after error in resource localization
[ https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-820: --- Attachment: YARN-820-trunk-1.patch Attaching the patch. Thanks, Mayank NodeManager has invalid state transition after error in resource localization - Key: YARN-820 URL: https://issues.apache.org/jira/browse/YARN-820 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Mayank Bansal Attachments: YARN-820-trunk-1.patch, yarn-user-nodemanager-localhost.localdomain.log -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-820) NodeManager has invalid state transition after error in resource localization
[ https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706210#comment-13706210 ] Mayank Bansal commented on YARN-820: Hi, I am reopening this and closing YARN-299 as this problem is more on this scenario as mentioned by [~ojoshi] https://issues.apache.org/jira/browse/YARN-299?focusedCommentId=13703820page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13703820 There is one more issue to synchronize the the call to tostring in terms of getting the resources. Fixing that as well as part of this JIRA. Thanks, Mayank NodeManager has invalid state transition after error in resource localization - Key: YARN-820 URL: https://issues.apache.org/jira/browse/YARN-820 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Mayank Bansal Attachments: yarn-user-nodemanager-localhost.localdomain.log -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706214#comment-13706214 ] Hadoop QA commented on YARN-245: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591902/YARN-245-trunk-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1461//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1461//console This message is automatically generated. Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-820) NodeManager has invalid state transition after error in resource localization
[ https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706229#comment-13706229 ] Hadoop QA commented on YARN-820: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591906/YARN-820-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1462//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1462//console This message is automatically generated. NodeManager has invalid state transition after error in resource localization - Key: YARN-820 URL: https://issues.apache.org/jira/browse/YARN-820 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Mayank Bansal Attachments: YARN-820-trunk-1.patch, yarn-user-nodemanager-localhost.localdomain.log -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706251#comment-13706251 ] Hitesh Shah commented on YARN-865: -- [~xgong] Documentation is still not clear. How are multiple types meant to be specified? Should one use /apps?appTypes=type1appTypes=type2 or some other format? How does the code handle it if appTypes is defined twice in the query params in the url? javax.ws.rs.QueryParam supports a [Sorted]Set out of the box. Should we look into using that directly instead of playing around with tokenizing based on ,? RM webservices can't query on application Types --- Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, YARN-865.3.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered
[ https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706253#comment-13706253 ] Mayank Bansal commented on YARN-369: Thanks [~bikassaha] for comitting this. I have updated the patch for YARN-912 Thanks, Mayank Handle ( or throw a proper error when receiving) status updates from application masters that have not registered - Key: YARN-369 URL: https://issues.apache.org/jira/browse/YARN-369 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.3-alpha, trunk-win Reporter: Hitesh Shah Assignee: Mayank Bansal Fix For: 2.1.0-beta Attachments: YARN-369.patch, YARN-369-trunk-1.patch, YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch Currently, an allocate call from an unregistered application is allowed and the status update for it throws a statemachine error that is silently dropped. org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: STATUS_UPDATE at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:680) ApplicationMasterService should likely throw an appropriate error for applications' requests that should not be handled in such cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-333) Schedulers cannot control the queue-name of an application
[ https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706271#comment-13706271 ] Sandy Ryza commented on YARN-333: - Attached rebased patch. Schedulers cannot control the queue-name of an application -- Key: YARN-333 URL: https://issues.apache.org/jira/browse/YARN-333 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, YARN-333.patch Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to default. A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-333) Schedulers cannot control the queue-name of an application
[ https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-333: Attachment: YARN-333-3.patch Schedulers cannot control the queue-name of an application -- Key: YARN-333 URL: https://issues.apache.org/jira/browse/YARN-333 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, YARN-333.patch Currently, if an app is submitted without a queue, RMAppManager sets the RMApp's queue to default. A scheduler may wish to make its own decision on which queue to place an app in if none is specified. For example, when the fair scheduler user-as-default-queue config option is set to true, and an app is submitted with no queue specified, the fair scheduler should assign the app to a queue with the user's name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-912) Create exceptions package in common/api for yarn and move client facing exceptions to them
[ https://issues.apache.org/jira/browse/YARN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706273#comment-13706273 ] Sandy Ryza commented on YARN-912: - Does it really make sense to put exceptions in their own package? Is their any precedent for this in other well known Java libraries? It seems to me that we should just put these in the package that is likely to throw them, i.e. org.apache.hadoop.yarn.client.api. A couple documentation nits: {code} - * requested memory/vcore is non-negative and not greater than max + * requested memory/vcore is non-negative and not greater than max throws + * exception codeInvalidResourceRequestException/code when there is + * invalid request {code} throws should be on a separate line as @throws {code} + /* + * This method will throw codeInvalidResourceBlacklistRequestException + * /code If the resource is not be able to add to black list. + */ {code} If the resource is not be able to add to black list. should be if the resource is not able to be added to the blacklist. Create exceptions package in common/api for yarn and move client facing exceptions to them -- Key: YARN-912 URL: https://issues.apache.org/jira/browse/YARN-912 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Mayank Bansal Attachments: YARN-912-trunk-1.patch Exceptions like InvalidResourceBlacklistRequestException, InvalidResourceRequestException, InvalidApplicationMasterRequestException etc are currently inside ResourceManager and not visible to clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706284#comment-13706284 ] Omkar Vinit Joshi commented on YARN-897: [~dedcode] Thanks for posting the patch... looked at the code.. bq. // Can't use childQueues.remove() since the TreeSet might be out of order. any reason for this even after this patch? if we don't see any other issues then why not just use childQueues.remove instead of iterating? * reinsertQueue could be marked synchronized? thoughts? But yeah.. without that too it is thread safe as we are locking it at CapacitySchedulder.nodeUpdate(). but still it is better to mark it. * LOG.info(Re-sorting queues since queue got completed: + childQueue.getQueuePath() + nit. line 80 * at present we send the container completed event to leaf queue and then keep propagating it till root. why not sent the event to root grab the locks from root-leaf and update it? any thoughts? CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-916) JobContext cache files api are broken
Omkar Vinit Joshi created YARN-916: -- Summary: JobContext cache files api are broken Key: YARN-916 URL: https://issues.apache.org/jira/browse/YARN-916 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi I just checked there are issues with latest distributed cache api. * JobContext.getLocalCacheFiles ... is deprecated.. should not have been deprecated. * JobContext.getCacheFiles is broken returns null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706327#comment-13706327 ] Carlo Curino commented on YARN-897: --- Omkar, thanks for the quick feedback... bq. any reason for this even after this patch? if we don't see any other issues then why not just use childQueues.remove instead of iterating? I initially thought the same, but I worried that since the underlying capacity attribute has been changed, the TreeSet is already non-consistent? [~dedcode] can you check whether this is true or not? Also can we use some careful operation ordering, and get away with Omkar suggestion? bq. reinsertQueue could be marked synchronized? thoughts? But yeah.. without that too it is thread safe as we are locking it at CapacitySchedulder.nodeUpdate(). but still it is better to mark it. We should probably follow your suggestion (especially if this method will be reused elsewhere), or at least use the lock annotations properly. (again this patch wasn't quite ready) bq. nit. line 80 will do bq. at present we send the container completed event to leaf queue and then keep propagating it till root. why not sent the event to root grab the locks from root-leaf and update it? any thoughts? Lock ordering is somewhat delicate (and I worry not very consistent). In general, the idea to lock bottom up should allow for part of the operations (updating of two leaf queues) to be concurrent until the recursion meet at some common ancestor, at which point we serialize. However, at least for some of the operations this is inside a global scheduler lock, so we loose that benefit in the first-place. It might be interesting to review the locks carefully and see whether we can rationalize them further. Although this is delicate, and unless we are lock-bound on the scheduler in practice would not buy us much. We didn't have time to test this through to a level I would be confident PAing this. Omkar do you have any cycle to test this? [~acmurthy],[~tgraves] do you guys have a moment to review this? BTW we are working on a discrete event simulator, which should allow us to lock-step/debug the entire RM codebase... that would make for easy testing of some of this stuff (more as soon as we get it ready to show it around). CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging
[ https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706339#comment-13706339 ] Alejandro Abdelnur commented on YARN-366: - [~vinodkv], you have been following this one, anything else you think it should be addressed before committing? I'd like to get this in 2.1-beta if possible. Add a tracing async dispatcher to simplify debugging Key: YARN-366 URL: https://issues.apache.org/jira/browse/YARN-366 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.0.2-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366-7.patch, YARN-366.patch Exceptions thrown in YARN/MR code with asynchronous event handling do not contain informative stack traces, as all handle() methods sit directly under the dispatcher thread's loop. This makes errors very difficult to debug for those who are not intimately familiar with the code, as it is difficult to see which chain of events caused a particular outcome. I propose adding an AsyncDispatcher that instruments events with tracing information. Whenever an event is dispatched during the handling of another event, the dispatcher would annotate that event with a pointer to its parent. When the dispatcher catches an exception, it could reconstruct a stack trace of the chain of events that led to it, and be able to log something informative. This would be an experimental feature, off by default, unless extensive testing showed that it did not have a significant performance impact. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706348#comment-13706348 ] Djellel Eddine Difallah commented on YARN-897: -- Omkar, thanks for the feedback {quote}any reason for this even after this patch? if we don't see any other issues then why not just use childQueues.remove instead of iterating?{quote} The tree is already out of order because of the new usedCapacity, the remove() won't work. We have to iterate and add() to fix the order. {quote}reinsertQueue could be marked synchronized? thoughts? But yeah.. without that too it is thread safe as we are locking it at CapacitySchedulder.nodeUpdate(). but still it is better to mark it.{quote} ok, sounds reasonable to put a synchronize there. {quote}LOG.info(Re-sorting queues since queue got completed: + childQueue.getQueuePath() + nit. line 80{quote} sure {quote}at present we send the container completed event to leaf queue and then keep propagating it till root. why not sent the event to root grab the locks from root-leaf and update it? any thoughts?{quote} Because the released container is linked to a leaf queue and we have to walk bottom up to figure out to which parent propagate. The assignment phase, however, works the way you described. CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations
[ https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706382#comment-13706382 ] Bikas Saha commented on YARN-521: - I have been extremely caught up today. Will try to get to this later tonight or tomorrow. Augment AM - RM client module to be able to request containers only at specific locations - Key: YARN-521 URL: https://issues.apache.org/jira/browse/YARN-521 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, YARN-521-3.patch, YARN-521.patch When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to offer an easy way to access their functionality -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-2.patch Patch reflecting Omkar's comments. 1) add synchronized to reinsertQueue 2) reduce line length CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706432#comment-13706432 ] Zhijie Shen commented on YARN-292: -- {code} // Acquire the AM container from the scheduler. Allocation amContainerAllocation = appAttempt.scheduler.allocate( appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null, null); {code} The above code will eventually pull the newly allocated containers in newlyAllocatedContainers. Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. Therefore, pulling newlyAllocatedContainers happens when RMContainer is at ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when it is still at NEW. In conclusion, one container in the allocation is expected in AMContainerAllocatedTransition. Hinted by [~nemon], the problem may happen at {code} FiCaSchedulerApp application = getApplication(applicationAttemptId); if (application == null) { LOG.error(Calling allocate on removed + or non existant application + applicationAttemptId); return EMPTY_ALLOCATION; } {code} EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be inconsistent synchronization on accessing the application map. Suddenly be aware that [~djp] has started working on this problem. Please feel free to take it over. Thanks! ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Key: YARN-292 URL: https://issues.apache.org/jira/browse/YARN-292 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Devaraj K Assignee: Zhijie Shen {code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706488#comment-13706488 ] Omkar Vinit Joshi commented on YARN-897: [~dedcode] please do keep older patches... it helps reviewing by sometimes diffing against older patches and verifying older comments... Thanks CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-744: --- Attachment: YARN-744-20130711.1.patch Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Djellel Eddine Difallah updated YARN-897: - Attachment: YARN-897-1.patch CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-592) Container logs lost for the application when NM gets restarted
[ https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706495#comment-13706495 ] Omkar Vinit Joshi commented on YARN-592: Just to be sure I might be wrong I am bit skeptical about .tmp file... are you sure it contains all the logs? My understanding is that it was still in the process and didn't finish with all. However even for completed logs.. it will enqueue them into the deletion service for future deletionwhich may or may not happen even for graceful shutdown as we kill NM after some time...right? thoughts? bq. This patch is trying to upload logs for the applications which run before and after NM restart. If the application gets completed after NM crash and before starting NM, atleast logs for the containers ran on that node can get from NM local logs dirs. This seems to be problematic. The time difference between AM finishing and NM starting can be as low as sec..or as high as hours.. we need to have definite policy for handling logs.. because if we don't handle this logs will be lying on nm waiting for already finished app to finish ... right?.. thoughts? Container logs lost for the application when NM gets restarted -- Key: YARN-592 URL: https://issues.apache.org/jira/browse/YARN-592 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.1-alpha, 2.0.3-alpha Reporter: Devaraj K Assignee: Devaraj K Priority: Critical Attachments: YARN-592.patch While running a big job if the NM goes down due to some reason and comes back, it will do the log aggregation for the newly launched containers and deletes all the containers for the application. This case we don't get the container logs from HDFS or local for the containers which are launched before restart and completed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi resolved YARN-541. Resolution: Invalid getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706506#comment-13706506 ] Omkar Vinit Joshi commented on YARN-541: I am closing this as invalid... please reopen if you still see the issue is there... getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reopened YARN-541: -- [~ojoshi] [~write2kishore] I think [~bikassaha] discovered a race condition in the AMRMClient that may be causing this. getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706553#comment-13706553 ] Vinod Kumar Vavilapalli commented on YARN-321: -- Fundamentally, this JIRA is to track the management of data related to finished applications via a new server called ApplicationHistoryService (AHS). Some important design points: h4. Basics - ResoureManager will write per-application data to a (hopefully very) thin {{HistoryStorage}} layer. - ResourceManager will push the data to HistoryStorage after an application finishes in a separate thread. - HistoryStorage is different from the current RMStateStore and so unlike JobHistory, HistoryStorage isn't used for state-tracking or as a transaction log. ResourceManager will try to publish information about completed apps in a best-case manner but there will be edge cases during RM restart where we may not be flushing some data. Fixing it to be consistent and complete over an RM restart will be a future step. - HistoryStorage will have publish app-info, retrieve app-info and list apps APIs and can have various implementations -- A file based implementation where RM writes per-app files to DFS, HistoryStorage will take care of file management like we do today in JobHistoryServer (JHS) and serve users by reading the data in files -- A shared bus implementation where RM directly writes to AHS and AHS persists them in a storage that it controls - Files/DB etc. - To start with, we will have an implementation with per-app HDFS file. h4. Miscellaneous - *Running as service*: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. - *User interfaces*: Command line clients and/or web-clients will have RPC and web and REST interfaces to interact with ApplicationHistoryService to get info about finished applications. Fundamentally, we'll have two types of interfaces -- Per-app info -- List of all apps -- Querying list of apps based on user-name, queue-name etc. To start with, we will imitate what JHS does, throw up list of all apps and do the filtering client side. But we need a better server side solution. - *Aggregated logs*: Logs will be served and potentially log management (expiry etc.) by ApplicationHistoryService via an abstract LogService component. - *Retention*: ApplicationHistoryService will have components to take care of retention - expiring very old apps. - *Security*: ApplicationHistoryService will have security from start, will use tokens similar to JHS. h4. Out of scope - Hosting/serving per-framework data is out of scope for this JIRA. It is related to ApplicationHistoryService but I am keeping focus on generic data for now on this JIRA, will file a separate ticket for ApplicationHistoryService or a related service to work with per-framework or app data. I see a transition phase where we would continue to run AHS and JHS run at the same time till the other JIRA is resolved. - *Long running services*: We won't be having any special support for long running services yet. We should track this with other long running services' support. Feedback apprecitated. I am going kickstarting this right now. I am creating a branch for faster progress. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706554#comment-13706554 ] Hadoop QA commented on YARN-744: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591936/YARN-744-20130711.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1465//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1465//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1465//console This message is automatically generated. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706561#comment-13706561 ] Omkar Vinit Joshi commented on YARN-701: I have checked the patch some comments * Earlier it was possible even in secured environment to use AMRMToken for appAttemptId1 and request containers for appAttemptId2. It is fixed now in authorize call for both cases. * Patch works in secured and unsecured environment. * It makes sense to remove appAttemptId from request.. thoughts?? backward compatibility? * However there is a problem if we restart node manager on which AM was running during application run. Attaching logs. ApplicationTokens should be used irrespective of kerberos - Key: YARN-701 URL: https://issues.apache.org/jira/browse/YARN-701 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log - Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-701: --- Attachment: yarn-ojoshi-resourcemanager-HW10351.local.log ApplicationTokens should be used irrespective of kerberos - Key: YARN-701 URL: https://issues.apache.org/jira/browse/YARN-701 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log - Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706562#comment-13706562 ] Hadoop QA commented on YARN-701: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12591950/yarn-ojoshi-resourcemanager-HW10351.local.log against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1466//console This message is automatically generated. ApplicationTokens should be used irrespective of kerberos - Key: YARN-701 URL: https://issues.apache.org/jira/browse/YARN-701 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log - Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt
[ https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706579#comment-13706579 ] Junping Du commented on YARN-292: - Hi [~zjshen], I think your work above reveal the root cause of this bug. So please feel free to go ahead and fix it. I will also help to review it. Thx! ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt Key: YARN-292 URL: https://issues.apache.org/jira/browse/YARN-292 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Devaraj K Assignee: Zhijie Shen {code:xml} 2012-12-26 08:41:15,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: Calling allocate on removed or non existant application appattempt_1356385141279_49525_01 2012-12-26 08:41:15,031 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type CONTAINER_ALLOCATED for applicationAttempt application_1356385141279_49525 java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.Arrays$ArrayList.get(Arrays.java:3381) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706589#comment-13706589 ] Krishna Kishore Bonagiri commented on YARN-541: --- I shall try to get you the logs you needed today or as soon as possible and reopen it. On Fri, Jul 12, 2013 at 5:49 AM, Omkar Vinit Joshi (JIRA) getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706601#comment-13706601 ] Hitesh Shah commented on YARN-541: -- [~write2kishore] if you plan to re-run this to get new logs, could you please run the RM and NM with DEBUG log level. Thanks. getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706602#comment-13706602 ] Hitesh Shah commented on YARN-541: -- Likewise have the AM also run with the debug log level if possible. getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-541: Assignee: Omkar Vinit Joshi (was: Vinod Kumar Vavilapalli) getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-541: Assignee: Vinod Kumar Vavilapalli (was: Omkar Vinit Joshi) getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Vinod Kumar Vavilapalli Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706645#comment-13706645 ] Hitesh Shah commented on YARN-321: -- {quote} To start with, we will have an implementation with per-app HDFS file. {quote} [~vinodkv] Based on the above, it seems like this will address allowing someone to analyse only one job at a time. Based on a per-app file, it will be non-trivial to search for applications that match a certain criteria? All jobs that run on a certain day? All jobs of a certain type? All jobs that took longer than 10 mins to run? All jobs that use over 100 containers? Sure, a directory hierarchy based on dates may solve the very basic use-cases but it looks like anyone needing to do any slightly more complex analysis on cluster utilization will need to build an indexing layer on top of the file-based store? Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706646#comment-13706646 ] Krishna Kishore Bonagiri commented on YARN-541: --- Hitesh, How can I do that? getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers
[ https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706650#comment-13706650 ] Hitesh Shah commented on YARN-541: -- export HADOOP_ROOT_LOGGER=DEBUG,RFA export YARN_ROOT_LOGGER=DEBUG,RFA when starting the RM and NM. For the DSShell, you can use --log_properties and pass in a log4j.properties which has a hardcoded DEBUG level for the root logger. However, based on what I can see, the DS Shell AM at DEBUG level may not be necessary. getAllocatedContainers() is not returning all the allocated containers -- Key: YARN-541 URL: https://issues.apache.org/jira/browse/YARN-541 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Environment: Redhat Linux 64-bit Reporter: Krishna Kishore Bonagiri Assignee: Omkar Vinit Joshi Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, yarn-dsadm-resourcemanager-isredeng.out I am running an application that was written and working well with the hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the getAllocatedContainers() method called on AMResponse is not returning all the containers allocated sometimes. For example, I request for 10 containers and this method gives me only 9 containers sometimes, and when I looked at the log of Resource Manager, the 10th container is also allocated. It happens only sometimes randomly and works fine all other times. If I send one more request for the remaining container to RM after it failed to give them the first time(and before releasing already acquired ones), it could allocate that container. I am running only one application at a time, but 1000s of them one after another. My main worry is, even though the RM's log is saying that all 10 requested containers are allocated, the getAllocatedContainers() method is not returning me all of them, it returned only 9 surprisingly. I never saw this kind of issue in the previous version, i.e. hadoop-2.0.0-alpha. Thanks, Kishore -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell
[ https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706653#comment-13706653 ] Abhishek Kapoor commented on YARN-816: -- Preemption is one of the case where container can be killed while application is still running. We can take inspiration from CPU scheduling algorithms done in OS. Also if application is preempted we can provide a way to let app know that if it is going to get preempted and during recovery we aware app then it was bring preempted. Probably a event fired to app letting it know what is going(preempt) to happen and what has happened(preempted). Sorry if it sounds confusing I am open for discussion Implement AM recovery for distributed shell --- Key: YARN-816 URL: https://issues.apache.org/jira/browse/YARN-816 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Simple recovery to just continue from where it left off is a good start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706654#comment-13706654 ] Vinod Kumar Vavilapalli commented on YARN-321: -- Like I mentioned: bq. Querying list of apps based on user-name, queue-name etc. To start with, we will imitate what JHS does, throw up list of all apps and do the filtering client side. But we need a better server side solution. So for both the CLI and web UI, we will start with a client side basic filtering, perhaps coupled with paging on the results. More advanced analytics needs a more robust server side solution. I can already imagine file-based indices, but a more query friendly storage will be needed - a table view via HCat/HBase over HDFS will be a good start. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell
[ https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706662#comment-13706662 ] Vinod Kumar Vavilapalli commented on YARN-816: -- I originally filed this to make DistributedShell AM to recover when the node running AM crashes. There are two things it can do - Just restart everything from scratch - Or remember how many nodes are already taken care of and only run the remaining. - While we do this, we should generally try to design libraries that help other framework writers implement state recovery on AM crash or atleast create some conventions. Implement AM recovery for distributed shell --- Key: YARN-816 URL: https://issues.apache.org/jira/browse/YARN-816 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Simple recovery to just continue from where it left off is a good start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell
[ https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706667#comment-13706667 ] Abhishek Kapoor commented on YARN-816: -- Couldn't agree more [~vinodkv] We can have state of AM communicated to RM. When AM boots up, the state from RM should be communicated to AM for example whether its a fresh start or a recovery and if its a recovery the state of the nodes app was running on, should be communicated to AM by RM. The above use case might require communication protocol change between AM and RM .. Implement AM recovery for distributed shell --- Key: YARN-816 URL: https://issues.apache.org/jira/browse/YARN-816 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Simple recovery to just continue from where it left off is a good start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira