date:20130711


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705515#comment-13705515
 ] 

Xuan Gong commented on YARN-763:


Oh, You are right. If we want to let CallBackThread to call asyncClient.stop(), 
we might need to add this part of code inside the CallBackThread.run(). In that 
case, we may need to create a new test class, such as mockAMRMClientAsync, and 
re-write CallBackThread.run().

Any other ideas ? 

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM

2013-07-11 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705556#comment-13705556
]

Bikas Saha commented on YARN-763:
-

We can simply create another version of the TestCallbackHandler that calls
asyncClient.stop() when asyncClient calls its getProgress() method. After the
stop() has completed the method can set a flag and notifyAll(this). The main
test thread can wait() on the handler object and check that the flag is set
when it gets notified. Else it waits again. This way if the callback thread is
deadlocked then test thread will not exit and the test will fail with timeout.
To verify, the test should fail with join() and pass without it. Similar logic
is used in other tests.

Please lets not sleep(1000) as this just slows down the testing. Lets sleep(50)
and set the heartbeat interval to 10. This allows for 5 heartbeats and so the
verification that actual heartbeat count == 1 is accurate.

AMRMClientAsync should stop heartbeating after receiving shutdown from RM
-

Key: YARN-763
URL: https://issues.apache.org/jira/browse/YARN-763
Project: Hadoop YARN
Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch,
YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric


[ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705694#comment-13705694
 ] 

Hudson commented on YARN-736:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, 
 YARN-736-4.patch, YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output


[ 
https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705702#comment-13705702
 ] 

Hudson commented on YARN-368:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
YARN-368. Fixed a typo in error message in Auxiliary services. Contributed 
by Albert Chu. (Revision 1501852)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Fix typo defiend should be defined in error output
 --

 Key: YARN-368
 URL: https://issues.apache.org/jira/browse/YARN-368
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Trivial
 Fix For: 2.1.1-beta

 Attachments: YARN-368.patch


 Noticed the following in an error log output while doing some experiements
 ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException:
  No class defiend for uda.shuffle
 defiend should be defined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)


[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705699#comment-13705699
 ] 

Hudson commented on YARN-569:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
YARN-569. Add support for requesting and enforcing preemption requests via
a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 
1502083)

 Result = SUCCESS
cdouglas : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project:

[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl


[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705703#comment-13705703
 ] 

Hudson commented on YARN-295:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
YARN-295. Fixed a race condition in ResourceManager RMAppAttempt state 
machine. Contributed by Mayank Bansal. (Revision 1501856)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501856
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Fix For: 2.1.1-beta

 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, 
 YARN-295-trunk-3.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-866) Add test for class ResourceWeights


[ 
https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705705#comment-13705705
 ] 

Hudson commented on YARN-866:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add test for class ResourceWeights
 --

 Key: YARN-866
 URL: https://issues.apache.org/jira/browse/YARN-866
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch


 Add test case for the class ResourceWeights

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-883) Expose Fair Scheduler-specific queue metrics


[ 
https://issues.apache.org/jira/browse/YARN-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705691#comment-13705691
 ] 

Hudson commented on YARN-883:
-

Integrated in Hadoop-Yarn-trunk #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/267/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Expose Fair Scheduler-specific queue metrics
 

 Key: YARN-883
 URL: https://issues.apache.org/jira/browse/YARN-883
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.5-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-883-1.patch, YARN-883-1.patch, YARN-883.patch


 When the Fair Scheduler is enabled, QueueMetrics should include fair share, 
 minimum share, and maximum share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-866) Add test for class ResourceWeights


[ 
https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705807#comment-13705807
 ] 

Hudson commented on YARN-866:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add test for class ResourceWeights
 --

 Key: YARN-866
 URL: https://issues.apache.org/jira/browse/YARN-866
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch


 Add test case for the class ResourceWeights

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)


[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705801#comment-13705801
 ] 

Hudson commented on YARN-569:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
YARN-569. Add support for requesting and enforcing preemption requests via
a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 
1502083)

 Result = FAILURE
cdouglas : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project:

[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output


[ 
https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705804#comment-13705804
 ] 

Hudson commented on YARN-368:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
YARN-368. Fixed a typo in error message in Auxiliary services. Contributed 
by Albert Chu. (Revision 1501852)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Fix typo defiend should be defined in error output
 --

 Key: YARN-368
 URL: https://issues.apache.org/jira/browse/YARN-368
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Trivial
 Fix For: 2.1.1-beta

 Attachments: YARN-368.patch


 Noticed the following in an error log output while doing some experiements
 ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException:
  No class defiend for uda.shuffle
 defiend should be defined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-736) Add a multi-resource fair sharing metric


[ 
https://issues.apache.org/jira/browse/YARN-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705794#comment-13705794
 ] 

Hudson commented on YARN-736:
-

Integrated in Hadoop-Hdfs-trunk #1457 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1457/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add a multi-resource fair sharing metric
 

 Key: YARN-736
 URL: https://issues.apache.org/jira/browse/YARN-736
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.0-beta

 Attachments: YARN-736-1.patch, YARN-736-2.patch, YARN-736-3.patch, 
 YARN-736-4.patch, YARN-736.patch


 Currently, at a regular interval, the fair scheduler computes a fair memory 
 share for each queue and application inside it.  This fair share is not used 
 for scheduling decisions, but is displayed in the web UI, exposed as a 
 metric, and used for preemption decisions.
 With DRF and multi-resource scheduling, assigning a memory share as the fair 
 share metric to every queue no longer makes sense.  It's not obvious what the 
 replacement should be, but probably something like fractional fairness within 
 a queue, or distance from an ideal cluster state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-368) Fix typo defiend should be defined in error output


[ 
https://issues.apache.org/jira/browse/YARN-368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705863#comment-13705863
 ] 

Hudson commented on YARN-368:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
YARN-368. Fixed a typo in error message in Auxiliary services. Contributed 
by Albert Chu. (Revision 1501852)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501852
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java


 Fix typo defiend should be defined in error output
 --

 Key: YARN-368
 URL: https://issues.apache.org/jira/browse/YARN-368
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Albert Chu
Assignee: Albert Chu
Priority: Trivial
 Fix For: 2.1.1-beta

 Attachments: YARN-368.patch


 Noticed the following in an error log output while doing some experiements
 ./1066018/nodes/hyperion987/log/yarn-achu-nodemanager-hyperion987.out:java.lang.RuntimeException:
  No class defiend for uda.shuffle
 defiend should be defined

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)


[ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705860#comment-13705860
 ] 

Hudson commented on YARN-569:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
YARN-569. Add support for requesting and enforcing preemption requests via
a capacity monitor. Contributed by Carlo Curino, Chris Douglas (Revision 
1502083)

 Result = SUCCESS
cdouglas : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502083
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/Priority.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingEditPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ContainerPreemptEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/PreemptableResourceScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569

[jira] [Commented] (YARN-866) Add test for class ResourceWeights


[ 
https://issues.apache.org/jira/browse/YARN-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705866#comment-13705866
 ] 

Hudson commented on YARN-866:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
updating CHANGES.txt after committing 
MAPREDUCE-5333,HADOOP-9661,HADOOP-9355,HADOOP-9673,HADOOP-9414,HADOOP-9416,HDFS-4797,YARN-866,YARN-736,YARN-883
 to 2.1-beta branch (Revision 1502075)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1502075
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add test for class ResourceWeights
 --

 Key: YARN-866
 URL: https://issues.apache.org/jira/browse/YARN-866
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.1.0-beta

 Attachments: Yarn-866.patch, Yarn-866.patch, YARN-866.patch


 Add test case for the class ResourceWeights

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl


[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705864#comment-13705864
 ] 

Hudson commented on YARN-295:
-

Integrated in Hadoop-Mapreduce-trunk #1484 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1484/])
YARN-295. Fixed a race condition in ResourceManager RMAppAttempt state 
machine. Contributed by Mayank Bansal. (Revision 1501856)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1501856
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Fix For: 2.1.1-beta

 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch, 
 YARN-295-trunk-3.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell


[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705929#comment-13705929
 ] 

Abhishek Kapoor commented on YARN-816:
--

Please correct me if I am wrong.
Are you suggesting a use case where job if fails will start from where it dies 
? If yes, then i think we need to maintain a sate of user application running 
on container allocated. Isn't it a user application's responsibility to figure 
it out whether its a fresh start of app or a recovery ? 

 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-815) Add container failure handling to distributed-shell


 [ 
https://issues.apache.org/jira/browse/YARN-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kapoor reassigned YARN-815:


Assignee: Abhishek Kapoor

 Add container failure handling to distributed-shell
 ---

 Key: YARN-815
 URL: https://issues.apache.org/jira/browse/YARN-815
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Abhishek Kapoor

 Today if any container fails because of whatever reason, the app simply 
 ignores them. We should handle retries, improve error reporting etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-865) RM webservices can't query on application Types


 [ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-865:
---

Attachment: YARN-865.3.patch

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-865) RM webservices can't query on application Types


[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705956#comment-13705956
 ] 

Xuan Gong commented on YARN-865:


Yes, those logic should move out of the loop.

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-865) RM webservices can't query on application Types


[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705973#comment-13705973
 ] 

Hadoop QA commented on YARN-865:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591874/YARN-865.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1458//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1458//console

This message is automatically generated.

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


 [ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-763:
---

Attachment: YARN-763.8.patch

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, 
 YARN-763.8.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706035#comment-13706035
 ] 

Hadoop QA commented on YARN-763:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12591879/YARN-763.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1459//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1459//console

This message is automatically generated.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch, YARN-763.3.patch, 
 YARN-763.4.patch, YARN-763.5.patch, YARN-763.6.patch, YARN-763.7.patch, 
 YARN-763.8.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell


[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706050#comment-13706050
 ] 

Omkar Vinit Joshi commented on YARN-816:


I think this is similar to preemption case... If application supports 
checkpointing then we can start from where it left of.. if not then start from 
scratch..

 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt


 [ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen reassigned YARN-292:


Assignee: Zhijie Shen

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706108#comment-13706108
 ] 

Zhijie Shen commented on YARN-292:
--

Will look into this problem

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues


[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706109#comment-13706109
 ] 

Omkar Vinit Joshi commented on YARN-897:


[~dedcode] / [~curino] you want to work on the patch or can I take over? seems 
like an important bug which needs to be fixed. I looked at the code and on 
container completion it is not resorting the TreeSet which will result into 
unfairness..

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-661) NM fails to cleanup local directories for users


 [ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-661:
---

Attachment: YARN-661-20130711.1.patch

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, 
 YARN-661-20130710.1.patch, YARN-661-20130711.1.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-865) RM webservices can't query on application Types


[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706126#comment-13706126
 ] 

Zhijie Shen commented on YARN-865:
--

+1 for the latest patch

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-661) NM fails to cleanup local directories for users


[ 
https://issues.apache.org/jira/browse/YARN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706137#comment-13706137
 ] 

Hadoop QA commented on YARN-661:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591891/YARN-661-20130711.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1460//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1460//console

This message is automatically generated.

 NM fails to cleanup local directories for users
 ---

 Key: YARN-661
 URL: https://issues.apache.org/jira/browse/YARN-661
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.0-beta, 0.23.8
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
 Attachments: YARN-661-20130701.patch, YARN-661-20130708.patch, 
 YARN-661-20130710.1.patch, YARN-661-20130711.1.patch


 YARN-71 added deletion of local directories on startup, but in practice it 
 fails to delete the directories because of permission problems.  The 
 top-level usercache directory is owned by the user but is in a directory that 
 is not writable by the user.  Therefore the deletion of the user's usercache 
 directory, as the user, fails due to lack of permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706152#comment-13706152
 ] 

Carlo Curino commented on YARN-897:
---

I agree this need fixing soon. We have a first draft of the patch, we were 
planning to test it out carefully before posting it, but if you have cycles we 
can socialize it right-away and we can work on it together.  
[~dedcode] please post the patch in its current state. [~ojoshi] you can check 
it out and we can test/verify in the meantime. 

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues


 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Djellel Eddine Difallah updated YARN-897:
-

Attachment: YARN-897-1.patch

Attached is a first patch attempt to address the bug:
Upon container completion, which triggers completedContainer(), remove and 
reinsert the queue into its parent's childQueues. This operation is done 
recursively starting from the leafQueue where the container got released. 
Thus, by handling both cases where usedCapacity is ever changed (assignement 
and completion) the TreeSet remains properly sorted.

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED


 [ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-245:
---

Attachment: YARN-245-trunk-2.patch

Thanks [~ojoshi] and [~vinodkv] for the review.

Updated the patch.

Thanks,
Mayank

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE


[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706192#comment-13706192
 ] 

Mayank Bansal commented on YARN-299:


Sure [~vinodkv]. I am reopening YARN-820 and closing this one.

Thanks,
Mayank

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE


 [ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal resolved YARN-299.


Resolution: Cannot Reproduce

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch, YARN-299-trunk-2.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706197#comment-13706197
 ] 

Omkar Vinit Joshi commented on YARN-744:


[~bikassaha] sounds reasonable ..will take a look at it again.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-820) NodeManager has invalid state transition after error in resource localization


 [ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-820:
---

Attachment: YARN-820-trunk-1.patch

Attaching the patch.

Thanks,
Mayank

 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-820-trunk-1.patch, 
 yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-820) NodeManager has invalid state transition after error in resource localization


[ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706210#comment-13706210
 ] 

Mayank Bansal commented on YARN-820:


Hi,

I am reopening this and closing YARN-299 as this problem is more on this 
scenario as mentioned by [~ojoshi]

https://issues.apache.org/jira/browse/YARN-299?focusedCommentId=13703820page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13703820

There is one more issue to synchronize the the call to tostring in terms of 
getting the resources. Fixing that as well as part of this JIRA.

Thanks,
Mayank



 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED


[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706214#comment-13706214
 ] 

Hadoop QA commented on YARN-245:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591902/YARN-245-trunk-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1461//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1461//console

This message is automatically generated.

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-820) NodeManager has invalid state transition after error in resource localization


[ 
https://issues.apache.org/jira/browse/YARN-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706229#comment-13706229
 ] 

Hadoop QA commented on YARN-820:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591906/YARN-820-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1462//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1462//console

This message is automatically generated.

 NodeManager has invalid state transition after error in resource localization
 -

 Key: YARN-820
 URL: https://issues.apache.org/jira/browse/YARN-820
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-820-trunk-1.patch, 
 yarn-user-nodemanager-localhost.localdomain.log




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-865) RM webservices can't query on application Types


[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706251#comment-13706251
 ] 

Hitesh Shah commented on YARN-865:
--

[~xgong] Documentation is still not clear. How are multiple types meant to be 
specified? Should one use /apps?appTypes=type1appTypes=type2 or some other 
format? How does the code handle it if appTypes is defined twice in the query 
params in the url?

javax.ws.rs.QueryParam supports a [Sorted]Set out of the box. Should we look 
into using that directly instead of playing around with tokenizing based on ,?

 RM webservices can't query on application Types
 ---

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered


[ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706253#comment-13706253
 ] 

Mayank Bansal commented on YARN-369:


Thanks [~bikassaha] for comitting this.
I have updated the patch for YARN-912

Thanks,
Mayank

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, trunk-win
Reporter: Hitesh Shah
Assignee: Mayank Bansal
 Fix For: 2.1.0-beta

 Attachments: YARN-369.patch, YARN-369-trunk-1.patch, 
 YARN-369-trunk-2.patch, YARN-369-trunk-3.patch, YARN-369-trunk-4.patch


 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-333) Schedulers cannot control the queue-name of an application

2013-07-11 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706271#comment-13706271
 ] 

Sandy Ryza commented on YARN-333:
-

Attached rebased patch.

 Schedulers cannot control the queue-name of an application
 --

 Key: YARN-333
 URL: https://issues.apache.org/jira/browse/YARN-333
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, 
 YARN-333.patch


 Currently, if an app is submitted without a queue, RMAppManager sets the 
 RMApp's queue to default.
 A scheduler may wish to make its own decision on which queue to place an app 
 in if none is specified. For example, when the fair scheduler 
 user-as-default-queue config option is set to true, and an app is submitted 
 with no queue specified, the fair scheduler should assign the app to a queue 
 with the user's name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-333) Schedulers cannot control the queue-name of an application

2013-07-11 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-333:


Attachment: YARN-333-3.patch

 Schedulers cannot control the queue-name of an application
 --

 Key: YARN-333
 URL: https://issues.apache.org/jira/browse/YARN-333
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-333-1.patch, YARN-333-2.patch, YARN-333-3.patch, 
 YARN-333.patch


 Currently, if an app is submitted without a queue, RMAppManager sets the 
 RMApp's queue to default.
 A scheduler may wish to make its own decision on which queue to place an app 
 in if none is specified. For example, when the fair scheduler 
 user-as-default-queue config option is set to true, and an app is submitted 
 with no queue specified, the fair scheduler should assign the app to a queue 
 with the user's name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-912) Create exceptions package in common/api for yarn and move client facing exceptions to them

2013-07-11 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706273#comment-13706273
 ] 

Sandy Ryza commented on YARN-912:
-

Does it really make sense to put exceptions in their own package?  Is their any 
precedent for this in other well known Java libraries?  It seems to me that we 
should just put these in the package that is likely to throw them, i.e. 
org.apache.hadoop.yarn.client.api.

A couple documentation nits:
{code}
-   * requested memory/vcore is non-negative and not greater than max
+   * requested memory/vcore is non-negative and not greater than max throws
+   * exception codeInvalidResourceRequestException/code when there is
+   * invalid request
{code}
throws should be on a separate line as @throws

{code}
+  /*
+   * This method will throw codeInvalidResourceBlacklistRequestException
+   * /code If the resource is not be able to add to black list.
+   */
{code}
If the resource is not be able to add to black list. should be if the 
resource is not able to be added to the blacklist.

 Create exceptions package in common/api for yarn and move client facing 
 exceptions to them
 --

 Key: YARN-912
 URL: https://issues.apache.org/jira/browse/YARN-912
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Mayank Bansal
 Attachments: YARN-912-trunk-1.patch


 Exceptions like InvalidResourceBlacklistRequestException, 
 InvalidResourceRequestException, InvalidApplicationMasterRequestException etc 
 are currently inside ResourceManager and not visible to clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

[
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706284#comment-13706284
]

Omkar Vinit Joshi commented on YARN-897:

[~dedcode] Thanks for posting the patch... looked at the code..

bq. // Can't use childQueues.remove() since the TreeSet might be out of
order.
any reason for this even after this patch? if we don't see any other issues
then why not just use childQueues.remove instead of iterating?

* reinsertQueue could be marked synchronized? thoughts? But yeah.. without that
too it is thread safe as we are locking it at CapacitySchedulder.nodeUpdate().
but still it is better to mark it.

* LOG.info(Re-sorting queues since queue got completed: +
childQueue.getQueuePath() +
nit. line 80

* at present we send the container completed event to leaf queue and then keep
propagating it till root. why not sent the event to root grab the locks from
root-leaf and update it? any thoughts?

CapacityScheduler wrongly sorted queues
---

Key: YARN-897
URL: https://issues.apache.org/jira/browse/YARN-897
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Reporter: Djellel Eddine Difallah
Attachments: TestBugParentQueue.java, YARN-897-1.patch

The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity
defines the sort order. This ensures the queue with least UsedCapacity to
receive resources next. On containerAssignment we correctly update the order,
but we miss to do so on container completions. This corrupts the TreeSet
structure, and under-capacity queues might starve for resources.

[jira] [Created] (YARN-916) JobContext cache files api are broken

Omkar Vinit Joshi created YARN-916:
--

 Summary: JobContext cache files api are broken
 Key: YARN-916
 URL: https://issues.apache.org/jira/browse/YARN-916
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi


I just checked there are issues with latest distributed cache api.
* JobContext.getLocalCacheFiles ... is deprecated.. should not have been 
deprecated.
* JobContext.getCacheFiles is broken returns null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

2013-07-11 Thread Carlo Curino (JIRA)

[
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706327#comment-13706327
]

Carlo Curino commented on YARN-897:
---

Omkar, thanks for the quick feedback...

bq. any reason for this even after this patch? if we don't see any other issues
then why not just use childQueues.remove instead of iterating?
I initially thought the same, but I worried that since the underlying capacity
attribute has been changed, the TreeSet is already non-consistent? [~dedcode]
can you check whether this is true or not? Also can we use some careful
operation ordering, and get away with Omkar suggestion?

bq. reinsertQueue could be marked synchronized? thoughts? But yeah.. without
that too it is thread safe as we are locking it at
CapacitySchedulder.nodeUpdate(). but still it is better to mark it.

We should probably follow your suggestion (especially if this method will be
reused elsewhere), or at least use the lock annotations properly. (again this
patch wasn't quite ready)

bq. nit. line 80

will do

bq. at present we send the container completed event to leaf queue and then
keep propagating it till root. why not sent the event to root grab the locks
from root-leaf and update it? any thoughts?
Lock ordering is somewhat delicate (and I worry not very consistent). In
general, the idea to lock bottom up should allow for part of the operations
(updating of two leaf queues) to be concurrent until the recursion meet at some
common ancestor, at which point we serialize. However, at least for some of the
operations this is inside a global scheduler lock, so we loose that benefit in
the first-place. It might be interesting to review the locks carefully and see
whether we can rationalize them further. Although this is delicate, and unless
we are lock-bound on the scheduler in practice would not buy us much.

We didn't have time to test this through to a level I would be confident PAing
this. Omkar do you have any cycle to test this? [~acmurthy],[~tgraves] do you
guys have a moment to review this?

BTW we are working on a discrete event simulator, which should allow us to
lock-step/debug the entire RM codebase... that would make for easy testing of
some of this stuff (more as soon as we get it ready to show it around).

CapacityScheduler wrongly sorted queues
---

[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-07-11 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706339#comment-13706339
 ] 

Alejandro Abdelnur commented on YARN-366:
-

[~vinodkv], you have been following this one, anything else you think it should 
be addressed before committing? I'd like to get this in 2.1-beta if possible.


 Add a tracing async dispatcher to simplify debugging
 

 Key: YARN-366
 URL: https://issues.apache.org/jira/browse/YARN-366
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
 YARN-366-4.patch, YARN-366-5.patch, YARN-366-6.patch, YARN-366-7.patch, 
 YARN-366.patch


 Exceptions thrown in YARN/MR code with asynchronous event handling do not 
 contain informative stack traces, as all handle() methods sit directly under 
 the dispatcher thread's loop.
 This makes errors very difficult to debug for those who are not intimately 
 familiar with the code, as it is difficult to see which chain of events 
 caused a particular outcome.
 I propose adding an AsyncDispatcher that instruments events with tracing 
 information.  Whenever an event is dispatched during the handling of another 
 event, the dispatcher would annotate that event with a pointer to its parent. 
  When the dispatcher catches an exception, it could reconstruct a stack 
 trace of the chain of events that led to it, and be able to log something 
 informative.
 This would be an experimental feature, off by default, unless extensive 
 testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues

[
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706348#comment-13706348
]

Djellel Eddine Difallah commented on YARN-897:
--

Omkar, thanks for the feedback
{quote}any reason for this even after this patch? if we don't see any other
issues then why not just use childQueues.remove instead of iterating?{quote}
The tree is already out of order because of the new usedCapacity, the remove()
won't work. We have to iterate and add() to fix the order.
{quote}reinsertQueue could be marked synchronized? thoughts? But yeah.. without
that too it is thread safe as we are locking it at
CapacitySchedulder.nodeUpdate(). but still it is better to mark it.{quote}
ok, sounds reasonable to put a synchronize there.
{quote}LOG.info(Re-sorting queues since queue got completed: +
childQueue.getQueuePath() +
nit. line 80{quote}
sure
{quote}at present we send the container completed event to leaf queue and then
keep propagating it till root. why not sent the event to root grab the locks
from root-leaf and update it? any thoughts?{quote}
Because the released container is linked to a leaf queue and we have to walk
bottom up to figure out to which parent propagate. The assignment phase,
however, works the way you described.

CapacityScheduler wrongly sorted queues
---

[jira] [Commented] (YARN-521) Augment AM - RM client module to be able to request containers only at specific locations

2013-07-11 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706382#comment-13706382
 ] 

Bikas Saha commented on YARN-521:
-

I have been extremely caught up today. Will try to get to this later tonight or 
tomorrow.

 Augment AM - RM client module to be able to request containers only at 
 specific locations
 -

 Key: YARN-521
 URL: https://issues.apache.org/jira/browse/YARN-521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-521-1.patch, YARN-521-2.patch, YARN-521-2.patch, 
 YARN-521-3.patch, YARN-521.patch


 When YARN-392 and YARN-398 are completed, it would be good for AMRMClient to 
 offer an easy way to access their functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues


 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Djellel Eddine Difallah updated YARN-897:
-

Attachment: YARN-897-2.patch

Patch reflecting Omkar's comments. 1) add synchronized to reinsertQueue 2) 
reduce line length

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
 YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706432#comment-13706432
 ] 

Zhijie Shen commented on YARN-292:
--

{code}
  // Acquire the AM container from the scheduler.
  Allocation amContainerAllocation = appAttempt.scheduler.allocate(
  appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
  EMPTY_CONTAINER_RELEASE_LIST, null, null);
{code}
The above code will eventually pull the newly allocated containers in 
newlyAllocatedContainers.

Logically, AMContainerAllocatedTransition happens after RMAppAttempt receives 
CONTAINER_ALLOCATED. CONTAINER_ALLOCATED is sent during 
ContainerStartedTransition, when RMContainer is moving from NEW to ALLOCATED. 
Therefore, pulling newlyAllocatedContainers happens when RMContainer is at 
ALLOCATED. In contrast, RMContainer is added to newlyAllocatedContainers when 
it is still at NEW. In conclusion, one container in the allocation is expected 
in AMContainerAllocatedTransition.

Hinted by [~nemon], the problem may happen at
{code}
FiCaSchedulerApp application = getApplication(applicationAttemptId);
if (application == null) {
  LOG.error(Calling allocate on removed  +
  or non existant application  + applicationAttemptId);
  return EMPTY_ALLOCATION;
}
{code}
EMPTY_ALLOCATION has 0 container. Another observation is that there seems to be 
inconsistent synchronization on accessing the application map.

Suddenly be aware that [~djp] has started working on this problem. Please feel 
free to take it over. Thanks! 

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-897) CapacityScheduler wrongly sorted queues


[ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706488#comment-13706488
 ] 

Omkar Vinit Joshi commented on YARN-897:


[~dedcode] please do keep older patches... it helps reviewing by sometimes 
diffing against older patches and verifying older comments... Thanks

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


 [ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-744:
---

Attachment: YARN-744-20130711.1.patch

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues


 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Djellel Eddine Difallah updated YARN-897:
-

Attachment: YARN-897-1.patch

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
 Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
 YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-592) Container logs lost for the application when NM gets restarted

[
https://issues.apache.org/jira/browse/YARN-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706495#comment-13706495
]

Omkar Vinit Joshi commented on YARN-592:

Just to be sure I might be wrong I am bit skeptical about .tmp file... are
you sure it contains all the logs? My understanding is that it was still in the
process and didn't finish with all. However even for completed logs.. it will
enqueue them into the deletion service for future deletionwhich may or may
not happen even for graceful shutdown as we kill NM after some time...right?
thoughts?

bq. This patch is trying to upload logs for the applications which run before
and after NM restart. If the application gets completed after NM crash and
before starting NM, atleast logs for the containers ran on that node can get
from NM local logs dirs.

This seems to be problematic. The time difference between AM finishing and NM
starting can be as low as sec..or as high as hours.. we need to have definite
policy for handling logs.. because if we don't handle this logs will be lying
on nm waiting for already finished app to finish ... right?.. thoughts?

Container logs lost for the application when NM gets restarted
--

Key: YARN-592
URL: https://issues.apache.org/jira/browse/YARN-592
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.3-alpha
Reporter: Devaraj K
Assignee: Devaraj K
Priority: Critical
Attachments: YARN-592.patch

While running a big job if the NM goes down due to some reason and comes
back, it will do the log aggregation for the newly launched containers and
deletes all the containers for the application. This case we don't get the
container logs from HDFS or local for the containers which are launched
before restart and completed.

[jira] [Resolved] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

[
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Omkar Vinit Joshi resolved YARN-541.

Resolution: Invalid

getAllocatedContainers() is not returning all the allocated containers
--

Key: YARN-541
URL: https://issues.apache.org/jira/browse/YARN-541
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.0.3-alpha
Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out,
yarn-dsadm-resourcemanager-isredeng.out

I am running an application that was written and working well with the
hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the
getAllocatedContainers() method called on AMResponse is not returning all the
containers allocated sometimes. For example, I request for 10 containers and
this method gives me only 9 containers sometimes, and when I looked at the
log of Resource Manager, the 10th container is also allocated. It happens
only sometimes randomly and works fine all other times. If I send one more
request for the remaining container to RM after it failed to give them the
first time(and before releasing already acquired ones), it could allocate
that container. I am running only one application at a time, but 1000s of
them one after another.
My main worry is, even though the RM's log is saying that all 10 requested
containers are allocated, the getAllocatedContainers() method is not
returning me all of them, it returned only 9 surprisingly. I never saw this
kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
Thanks,
Kishore

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

[
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706506#comment-13706506
]

Omkar Vinit Joshi commented on YARN-541:

I am closing this as invalid... please reopen if you still see the issue is
there...

getAllocatedContainers() is not returning all the allocated containers
--

[jira] [Reopened] (YARN-541) getAllocatedContainers() is not returning all the allocated containers


 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reopened YARN-541:
--


[~ojoshi] [~write2kishore] I think [~bikassaha] discovered a race condition in 
the AMRMClient that may be causing this.

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706553#comment-13706553
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

Fundamentally, this JIRA is to track the management of data related to finished
applications via a new server called ApplicationHistoryService (AHS). Some
important design points:

h4. Basics
- ResoureManager will write per-application data to a (hopefully very) thin
{{HistoryStorage}} layer.
- ResourceManager will push the data to HistoryStorage after an application
finishes in a separate thread.
- HistoryStorage is different from the current RMStateStore and so unlike
JobHistory, HistoryStorage isn't used for state-tracking or as a transaction
log. ResourceManager will try to publish information about completed apps in a
best-case manner but there will be edge cases during RM restart where we may
not be flushing some data. Fixing it to be consistent and complete over an RM
restart will be a future step.
- HistoryStorage will have publish app-info, retrieve app-info and list apps
APIs and can have various implementations
-- A file based implementation where RM writes per-app files to DFS,
HistoryStorage will take care of file management like we do today in
JobHistoryServer (JHS) and serve users by reading the data in files
-- A shared bus implementation where RM directly writes to AHS and AHS
persists them in a storage that it controls - Files/DB etc.
- To start with, we will have an implementation with per-app HDFS file.

h4. Miscellaneous

- *Running as service*: By default, ApplicationHistoryService will be embedded
inside ResourceManager but will be independent enough to run as a separate
service for scaling purposes.

- *User interfaces*: Command line clients and/or web-clients will have RPC and
web and REST interfaces to interact with ApplicationHistoryService to get info
about finished applications. Fundamentally, we'll have two types of interfaces
-- Per-app info
-- List of all apps
-- Querying list of apps based on user-name, queue-name etc. To start with,
we will imitate what JHS does, throw up list of all apps and do the filtering
client side. But we need a better server side solution.

- *Aggregated logs*: Logs will be served and potentially log management
(expiry etc.) by ApplicationHistoryService via an abstract LogService component.

- *Retention*: ApplicationHistoryService will have components to take care of
retention - expiring very old apps.

- *Security*: ApplicationHistoryService will have security from start, will
use tokens similar to JHS.

h4. Out of scope

- Hosting/serving per-framework data is out of scope for this JIRA. It is
related to ApplicationHistoryService but I am keeping focus on generic data for
now on this JIRA, will file a separate ticket for ApplicationHistoryService or
a related service to work with per-framework or app data. I see a transition
phase where we would continue to run AHS and JHS run at the same time till the
other JIRA is resolved.

- *Long running services*: We won't be having any special support for long
running services yet. We should track this with other long running services'
support.

Feedback apprecitated.

I am going kickstarting this right now. I am creating a branch for faster
progress.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

The mapreduce job history server currently needs to be deployed as a trusted
server in sync with the mapreduce runtime. Every new application would need a
similar application history server. Having to deploy O(T*V) (where T is
number of type of application, V is number of version of application) trusted
servers is clearly not scalable.
Job history storage handling itself is pretty generic: move the logs and
history data into a particular directory for later serving. Job history data
is already stored as json (or binary avro). I propose that we create only one
trusted application history server, which can have a generic UI (display json
as a tree of strings) as well. Specific application/version can deploy
untrusted webapps (a la AMs) to query the application history server and
interpret the json for its specific UI and/or analytics.

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.


[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706554#comment-13706554
 ] 

Hadoop QA commented on YARN-744:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591936/YARN-744-20130711.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1465//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1465//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1465//console

This message is automatically generated.

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

[
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706561#comment-13706561
]

Omkar Vinit Joshi commented on YARN-701:

I have checked the patch some comments
* Earlier it was possible even in secured environment to use AMRMToken for
appAttemptId1 and request containers for appAttemptId2. It is fixed now in
authorize call for both cases.
* Patch works in secured and unsecured environment.
* It makes sense to remove appAttemptId from request.. thoughts?? backward
compatibility?
* However there is a problem if we restart node manager on which AM was running
during application run. Attaching logs.

ApplicationTokens should be used irrespective of kerberos
-

Key: YARN-701
URL: https://issues.apache.org/jira/browse/YARN-701
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt,
YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log

- Single code path for secure and non-secure cases is useful for testing,
coverage.
- Having this in non-secure mode will help us avoid accidental bugs in AMs
DDos'ing and bringing down RM.

[jira] [Updated] (YARN-701) ApplicationTokens should be used irrespective of kerberos


 [ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-701:
---

Attachment: yarn-ojoshi-resourcemanager-HW10351.local.log

 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos


[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706562#comment-13706562
 ] 

Hadoop QA commented on YARN-701:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12591950/yarn-ojoshi-resourcemanager-HW10351.local.log
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1466//console

This message is automatically generated.

 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-292) ResourceManager throws ArrayIndexOutOfBoundsException while handling CONTAINER_ALLOCATED for application attempt

2013-07-11 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706579#comment-13706579
 ] 

Junping Du commented on YARN-292:
-

Hi [~zjshen], I think your work above reveal the root cause of this bug. So 
please feel free to go ahead and fix it. I will also help to review it. Thx! 

 ResourceManager throws ArrayIndexOutOfBoundsException while handling 
 CONTAINER_ALLOCATED for application attempt
 

 Key: YARN-292
 URL: https://issues.apache.org/jira/browse/YARN-292
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Devaraj K
Assignee: Zhijie Shen

 {code:xml}
 2012-12-26 08:41:15,030 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler: 
 Calling allocate on removed or non existant application 
 appattempt_1356385141279_49525_01
 2012-12-26 08:41:15,031 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type CONTAINER_ALLOCATED for applicationAttempt 
 application_1356385141279_49525
 java.lang.ArrayIndexOutOfBoundsException: 0
   at java.util.Arrays$ArrayList.get(Arrays.java:3381)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:655)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMContainerAllocatedTransition.transition(RMAppAttemptImpl.java:644)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Krishna Kishore Bonagiri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706589#comment-13706589
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

I shall try to get you the logs you needed today or as soon as possible and
reopen it.


On Fri, Jul 12, 2013 at 5:49 AM, Omkar Vinit Joshi (JIRA)



 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

[
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706601#comment-13706601
]

Hitesh Shah commented on YARN-541:
--

[~write2kishore] if you plan to re-run this to get new logs, could you please
run the RM and NM with DEBUG log level. Thanks.

getAllocatedContainers() is not returning all the allocated containers
--

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers


[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706602#comment-13706602
 ] 

Hitesh Shah commented on YARN-541:
--

Likewise have the AM also run with the debug log level if possible. 

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-541) getAllocatedContainers() is not returning all the allocated containers


 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-541:


Assignee: Omkar Vinit Joshi  (was: Vinod Kumar Vavilapalli)

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-541) getAllocatedContainers() is not returning all the allocated containers


 [ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-541:


Assignee: Vinod Kumar Vavilapalli  (was: Omkar Vinit Joshi)

 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Vinod Kumar Vavilapalli
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706645#comment-13706645
]

Hitesh Shah commented on YARN-321:
--

{quote}
To start with, we will have an implementation with per-app HDFS file.
{quote}

[~vinodkv] Based on the above, it seems like this will address allowing someone
to analyse only one job at a time. Based on a per-app file, it will be
non-trivial to search for applications that match a certain criteria? All jobs
that run on a certain day? All jobs of a certain type? All jobs that took
longer than 10 mins to run? All jobs that use over 100 containers? Sure, a
directory hierarchy based on dates may solve the very basic use-cases but it
looks like anyone needing to do any slightly more complex analysis on cluster
utilization will need to build an indexing layer on top of the file-based store?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

2013-07-11 Thread Krishna Kishore Bonagiri (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706646#comment-13706646
 ] 

Krishna Kishore Bonagiri commented on YARN-541:
---

Hitesh,
  How can I do  that?





 getAllocatedContainers() is not returning all the allocated containers
 --

 Key: YARN-541
 URL: https://issues.apache.org/jira/browse/YARN-541
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
 Environment: Redhat Linux 64-bit
Reporter: Krishna Kishore Bonagiri
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stdout, yarn-dsadm-nodemanager-isredeng.out, 
 yarn-dsadm-resourcemanager-isredeng.out


 I am running an application that was written and working well with the 
 hadoop-2.0.0-alpha but when I am running the same against 2.0.3-alpha, the 
 getAllocatedContainers() method called on AMResponse is not returning all the 
 containers allocated sometimes. For example, I request for 10 containers and 
 this method gives me only 9 containers sometimes, and when I looked at the 
 log of Resource Manager, the 10th container is also allocated. It happens 
 only sometimes randomly and works fine all other times. If I send one more 
 request for the remaining container to RM after it failed to give them the 
 first time(and before releasing already acquired ones), it could allocate 
 that container. I am running only one application at a time, but 1000s of 
 them one after another.
 My main worry is, even though the RM's log is saying that all 10 requested 
 containers are allocated,  the getAllocatedContainers() method is not 
 returning me all of them, it returned only 9 surprisingly. I never saw this 
 kind of issue in the previous version, i.e. hadoop-2.0.0-alpha.
 Thanks,
 Kishore
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-541) getAllocatedContainers() is not returning all the allocated containers

[
https://issues.apache.org/jira/browse/YARN-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706650#comment-13706650
]

Hitesh Shah commented on YARN-541:
--

export HADOOP_ROOT_LOGGER=DEBUG,RFA
export YARN_ROOT_LOGGER=DEBUG,RFA
when starting the RM and NM.

For the DSShell, you can use --log_properties and pass in a log4j.properties
which has a hardcoded DEBUG level for the root logger. However, based on what I
can see, the DS Shell AM at DEBUG level may not be necessary.

getAllocatedContainers() is not returning all the allocated containers
--

[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell


[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706653#comment-13706653
 ] 

Abhishek Kapoor commented on YARN-816:
--

Preemption is one of the case where container can be killed while application 
is still running.
We can take inspiration from CPU scheduling algorithms done in OS.
Also if application is preempted we can provide a way to let app know that if 
it is going to get preempted and during recovery we aware app then it was bring 
preempted.
Probably a event fired to app letting it know what is going(preempt) to happen 
and what has happened(preempted).

Sorry if it sounds confusing
I am open for discussion


 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706654#comment-13706654
]

Vinod Kumar Vavilapalli commented on YARN-321:
--

Like I mentioned:
bq. Querying list of apps based on user-name, queue-name etc. To start with, we
will imitate what JHS does, throw up list of all apps and do the filtering
client side. But we need a better server side solution.
So for both the CLI and web UI, we will start with a client side basic
filtering, perhaps coupled with paging on the results. More advanced analytics
needs a more robust server side solution. I can already imagine file-based
indices, but a more query friendly storage will be needed - a table view via
HCat/HBase over HDFS will be a good start.

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli

[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell


[ 
https://issues.apache.org/jira/browse/YARN-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706662#comment-13706662
 ] 

Vinod Kumar Vavilapalli commented on YARN-816:
--

I originally filed this to make DistributedShell AM to recover when the node 
running AM crashes. There are two things it can do
 - Just restart everything from scratch
 - Or remember how many nodes are already taken care of and only run the 
remaining.
 - While we do this, we should generally try to design libraries that help 
other framework writers implement state recovery on AM crash or atleast create 
some conventions.

 Implement AM recovery for distributed shell
 ---

 Key: YARN-816
 URL: https://issues.apache.org/jira/browse/YARN-816
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli

 Simple recovery to just continue from where it left off is a good start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-816) Implement AM recovery for distributed shell