[jira] [Created] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1372:


 Summary: Ensure all completed containers are reported to the AMs 
across RM restart
 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


Currently the NM informs the RM about completed containers and then removes 
those containers from the RM notification list. The RM passes on that completed 
container information to the AM and the AM pulls this data. If the RM dies 
before the AM pulls this data then the AM may not be able to get this 
information again. To fix this, NM should maintain a separate list of such 
completed container notifications sent to the RM. After the AM has pulled the 
containers from the RM then the RM will inform the NM about it and the NM can 
remove the completed container from the new list. Upon re-register with the RM 
(after RM restart) the NM should send the entire list of completed containers 
to the RM along with any other containers that completed while the RM was dead. 
This ensures that the RM can inform the AM's about all completed containers. 
Some container completions may be reported more than once since the AM may have 
pulled the container but the RM may die before notifying the NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps

2013-10-30 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1373:


 Summary: Transition RMApp and RMAppAttempt state to RUNNING after 
restart for recovered running apps
 Key: YARN-1373
 URL: https://issues.apache.org/jira/browse/YARN-1373
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha


Currently the RM moves recovered app attempts to the a terminal recovered state 
and starts a new attempt. Instead, it will have to transition the last attempt 
to a running state such that it can proceed as normal once the running attempt 
has resynced with the ApplicationMasterService (YARN-1365 and YARN-1366). If 
the RM had started the application container before dying then the AM would be 
up and trying to contact the RM. The RM may have had died before launching the 
container. For this case, the RM should wait for AM liveliness period and issue 
a kill container for the stored master container. It should transition this 
attempt to some RECOVER_ERROR state and proceed to start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808815#comment-13808815
 ] 

Bikas Saha commented on YARN-556:
-

Added some coarse grained tasks based on the attached proposal. More tasks may 
be added as details get dissected.

> RM Restart phase 2 - Work preserving restart
> 
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808965#comment-13808965
 ] 

Hudson commented on YARN-1068:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Fix For: 2.3.0
>
> Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
> yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, 
> yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
> yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
> yarn-1068-9.patch, YARN-1068.Karthik.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808962#comment-13808962
 ] 

Hudson commented on YARN-1228:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Clean up Fair Scheduler configuration loading
> -
>
> Key: YARN-1228
> URL: https://issues.apache.org/jira/browse/YARN-1228
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.1-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.2.0
>
> Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch
>
>
> Currently the Fair Scheduler is configured in two ways
> * An allocations file that has a different format than the standard Hadoop 
> configuration file, which makes it easier to specify hierarchical objects 
> like queues and their properties. 
> * With properties like yarn.scheduler.fair.max.assign that are specified in 
> the standard Hadoop configuration format.
> The standard and default way of configuring it is to use fair-scheduler.xml 
> as the allocations file and to put the yarn.scheduler properties in 
> yarn-site.xml.
> It is also possible to specify a different file as the allocations file, and 
> to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
> interpreted as in the standard Hadoop configuration format.  This flexibility 
> is both confusing and unnecessary.
> Additionally, the allocation file is loaded as fair-scheduler.xml from the 
> classpath if it is not specified, but is loaded as a File if it is.  This 
> causes two problems
> 1. We see different behavior when not setting the 
> yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
> which is its default.
> 2. Classloaders may choose to cache resources, which can break the reload 
> logic when yarn.scheduler.fair.allocation.file is not specified.
> We should never allow the yarn.scheduler properties to go into 
> fair-scheduler.xml.  And we should always load the allocations file as a 
> file, not as a resource on the classpath.  To preserve existing behavior and 
> allow loading files from the classpath, we can look for files on the 
> classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808961#comment-13808961
 ] 

Hudson commented on YARN-1306:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #378 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/378/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Clean up hadoop-sls sample-conf according to YARN-1228
> --
>
> Key: YARN-1306
> URL: https://issues.apache.org/jira/browse/YARN-1306
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Fix For: 2.3.0
>
> Attachments: YARN-1306.patch
>
>
> Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
> all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808998#comment-13808998
 ] 

Hudson commented on YARN-1068:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Fix For: 2.3.0
>
> Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
> yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, 
> yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
> yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
> yarn-1068-9.patch, YARN-1068.Karthik.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808994#comment-13808994
 ] 

Hudson commented on YARN-1306:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Clean up hadoop-sls sample-conf according to YARN-1228
> --
>
> Key: YARN-1306
> URL: https://issues.apache.org/jira/browse/YARN-1306
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Fix For: 2.3.0
>
> Attachments: YARN-1306.patch
>
>
> Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
> all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808995#comment-13808995
 ] 

Hudson commented on YARN-1228:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1568 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1568/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Clean up Fair Scheduler configuration loading
> -
>
> Key: YARN-1228
> URL: https://issues.apache.org/jira/browse/YARN-1228
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.1-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.2.0
>
> Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch
>
>
> Currently the Fair Scheduler is configured in two ways
> * An allocations file that has a different format than the standard Hadoop 
> configuration file, which makes it easier to specify hierarchical objects 
> like queues and their properties. 
> * With properties like yarn.scheduler.fair.max.assign that are specified in 
> the standard Hadoop configuration format.
> The standard and default way of configuring it is to use fair-scheduler.xml 
> as the allocations file and to put the yarn.scheduler properties in 
> yarn-site.xml.
> It is also possible to specify a different file as the allocations file, and 
> to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
> interpreted as in the standard Hadoop configuration format.  This flexibility 
> is both confusing and unnecessary.
> Additionally, the allocation file is loaded as fair-scheduler.xml from the 
> classpath if it is not specified, but is loaded as a File if it is.  This 
> causes two problems
> 1. We see different behavior when not setting the 
> yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
> which is its default.
> 2. Classloaders may choose to cache resources, which can break the reload 
> logic when yarn.scheduler.fair.allocation.file is not specified.
> We should never allow the yarn.scheduler properties to go into 
> fair-scheduler.xml.  And we should always load the allocations file as a 
> file, not as a resource on the classpath.  To preserve existing behavior and 
> allow loading files from the classpath, we can look for files on the 
> classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Devaraj K (JIRA)
Devaraj K created YARN-1374:
---

 Summary: Resource Manager fails to start due to 
ConcurrentModificationException
 Key: YARN-1374
 URL: https://issues.apache.org/jira/browse/YARN-1374
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K
Priority: Blocker


Resource Manager is failing to start with the below 
ConcurrentModificationException.

{code:xml}
2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing 
hosts (include/exclude) list
2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service 
ResourceManager failed in state INITED; cause: 
java.util.ConcurrentModificationException
java.util.ConcurrentModificationException
at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,378 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
Transitioning to standby
2013-10-30 20:22:42,378 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned 
to standby
2013-10-30 20:22:42,378 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
ResourceManager
java.util.ConcurrentModificationException
at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
2013-10-30 20:22:42,379 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
/
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809024#comment-13809024
 ] 

Devaraj K commented on YARN-1374:
-

It is occurring when the scheduler monitor is enabled using 
'yarn.resourcemanager.scheduler.monitor.enable' configuration.

SchedulingMonitor service is getting added to RM services during RM services 
init which is causing ConcurrentModificationException. SchedulingMonitor 
service needs to be added to RMActiveServices instead of adding to RM service 
to avoid this.

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Priority: Blocker
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


unsub

2013-10-30 Thread skyw932



[jira] [Created] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2013-10-30 Thread Devaraj K (JIRA)
Devaraj K created YARN-1375:
---

 Summary: RM logs get filled with scheduler monitor logs when we 
enable scheduler monitoring
 Key: YARN-1375
 URL: https://issues.apache.org/jira/browse/YARN-1375
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Devaraj K



When we enable scheduler monitor, it is filling the RM logs with the same queue 
states periodically. We can log only when any difference with the previous 
state instead of logging the same message. 

{code:xml}
2013-10-30 23:30:08,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:11,464 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:14,465 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:17,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:20,466 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:23,467 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:26,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:29,468 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
2013-10-30 23:30:32,469 INFO 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
  QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
{code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1306) Clean up hadoop-sls sample-conf according to YARN-1228

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809053#comment-13809053
 ] 

Hudson commented on YARN-1306:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Clean up hadoop-sls sample-conf according to YARN-1228
> --
>
> Key: YARN-1306
> URL: https://issues.apache.org/jira/browse/YARN-1306
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Fix For: 2.3.0
>
> Attachments: YARN-1306.patch
>
>
> Move fair scheduler allocations configuration to fair-scheduler.xml, and move 
> all scheduler stuffs to yarn-site.xml



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809057#comment-13809057
 ] 

Hudson commented on YARN-1068:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
YARN-1068. Add admin support for HA operations (Karthik Kambatla via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536888)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineTextInputFormat.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/HAUtil.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMHAServiceTarget.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMHAProtocolService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMServerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/authorize/RMPolicyProvider.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java


> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Fix For: 2.3.0
>
> Attachments: yarn-1068-10.patch, yarn-1068-11.patch, 
> yarn-1068-12.patch, yarn-1068-13.patch, yarn-1068-14.patch, 
> yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, yarn-1068-4.patch, 
> yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, yarn-1068-8.patch, 
> yarn-1068-9.patch, YARN-1068.Karthik.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809054#comment-13809054
 ] 

Hudson commented on YARN-1228:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1594 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1594/])
YARN-1306. Clean up hadoop-sls sample-conf according to YARN-1228 (Wei Yan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1536982)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/capacity-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler-allocation.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/sample-conf/yarn-site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Clean up Fair Scheduler configuration loading
> -
>
> Key: YARN-1228
> URL: https://issues.apache.org/jira/browse/YARN-1228
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.1.1-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.2.0
>
> Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch
>
>
> Currently the Fair Scheduler is configured in two ways
> * An allocations file that has a different format than the standard Hadoop 
> configuration file, which makes it easier to specify hierarchical objects 
> like queues and their properties. 
> * With properties like yarn.scheduler.fair.max.assign that are specified in 
> the standard Hadoop configuration format.
> The standard and default way of configuring it is to use fair-scheduler.xml 
> as the allocations file and to put the yarn.scheduler properties in 
> yarn-site.xml.
> It is also possible to specify a different file as the allocations file, and 
> to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
> interpreted as in the standard Hadoop configuration format.  This flexibility 
> is both confusing and unnecessary.
> Additionally, the allocation file is loaded as fair-scheduler.xml from the 
> classpath if it is not specified, but is loaded as a File if it is.  This 
> causes two problems
> 1. We see different behavior when not setting the 
> yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
> which is its default.
> 2. Classloaders may choose to cache resources, which can break the reload 
> logic when yarn.scheduler.fair.allocation.file is not specified.
> We should never allow the yarn.scheduler properties to go into 
> fair-scheduler.xml.  And we should always load the allocations file as a 
> file, not as a resource on the classpath.  To preserve existing behavior and 
> allow loading files from the classpath, we can look for files on the 
> classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809085#comment-13809085
 ] 

Steve Loughran commented on YARN-1374:
--

If this exists, my code. I'd put the list into an unmodifiable list to stop 
concurrency problems, because I knew the risk -adding children while adding 
children existed.

It looks like that isn't enough -we need to take a snapshot of the list and 
then iterate through it.

-steve

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Priority: Blocker
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1031:
-

Attachment: YARN-1031-2-branch-0.23.patch

Updating previous patch to include the corresponding images

> JQuery UI components reference external css in branch-23
> 
>
> Key: YARN-1031
> URL: https://issues.apache.org/jira/browse/YARN-1031
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-1031-2-branch-0.23.patch, 
> YARN-1031-branch-0.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809171#comment-13809171
 ] 

Hadoop QA commented on YARN-1031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611075/YARN-1031-2-branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2315//console

This message is automatically generated.

> JQuery UI components reference external css in branch-23
> 
>
> Key: YARN-1031
> URL: https://issues.apache.org/jira/browse/YARN-1031
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-1031-2-branch-0.23.patch, 
> YARN-1031-branch-0.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1318:
---

Attachment: yarn-1318-2.patch

I am able to compile locally. Resubmitting the same patch.

> Promote AdminService to an Always-On service and merge in RMHAProtocolService
> -
>
> Key: YARN-1318
> URL: https://issues.apache.org/jira/browse/YARN-1318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>  Labels: ha
> Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
> yarn-1318-2.patch
>
>
> Per discussion in YARN-1068, we want AdminService to handle HA-admin 
> operations in addition to the regular non-HA admin operations. To facilitate 
> this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1031:
-

Attachment: YARN-1031-3-branch-0.23.patch

Oops, generated the diff with git diff --binary instead of git format-patch.  
Uploading a new patch.

Patch needs to be applied with git apply, and Jenkins doesn't know how to deal 
with patches against anything but trunk.

I tested this patch locally and the icons are once again working correctly on 
the YARN scheduler page and in tables.

> JQuery UI components reference external css in branch-23
> 
>
> Key: YARN-1031
> URL: https://issues.apache.org/jira/browse/YARN-1031
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-1031-2-branch-0.23.patch, 
> YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809244#comment-13809244
 ] 

Hadoop QA commented on YARN-1318:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611085/yarn-1318-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2316//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2316//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2316//console

This message is automatically generated.

> Promote AdminService to an Always-On service and merge in RMHAProtocolService
> -
>
> Key: YARN-1318
> URL: https://issues.apache.org/jira/browse/YARN-1318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>  Labels: ha
> Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
> yarn-1318-2.patch
>
>
> Per discussion in YARN-1068, we want AdminService to handle HA-admin 
> operations in addition to the regular non-HA admin operations. To facilitate 
> this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809249#comment-13809249
 ] 

Hadoop QA commented on YARN-1031:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611097/YARN-1031-3-branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2317//console

This message is automatically generated.

> JQuery UI components reference external css in branch-23
> 
>
> Key: YARN-1031
> URL: https://issues.apache.org/jira/browse/YARN-1031
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-1031-2-branch-0.23.patch, 
> YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1031) JQuery UI components reference external css in branch-23

2013-10-30 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809307#comment-13809307
 ] 

Jonathan Eagles commented on YARN-1031:
---

+1. Verified Jasons changes. Blocked access to ajax.googleapis.com via 
/etc/hosts before and after the change to visually inspect. Programmatically 
scanned network activity via firebug to verify new jquery-ui.css and icons are 
downloaded via local with no GETs to ajax.googleapis.com.

> JQuery UI components reference external css in branch-23
> 
>
> Key: YARN-1031
> URL: https://issues.apache.org/jira/browse/YARN-1031
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 0.23.9
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-1031-2-branch-0.23.patch, 
> YARN-1031-3-branch-0.23.patch, YARN-1031-branch-0.23.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-10-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809344#comment-13809344
 ] 

Xuan Gong commented on YARN-1279:
-

The basic idea is NM notifies the RM about its log aggregation status of all 
the containers through the node heartBeat. And when RMNode get the log 
aggregation status from the npdeUpdateEvent, it will forward the log status to 
related RMApp. After that, the client can get the log aggregation status by 
calling related API.

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1321:
-

Attachment: YARN-1321.patch

patch fixing javadoc warning.

> NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to 
> work correctly
> 
>
> Key: YARN-1321
> URL: https://issues.apache.org/jira/browse/YARN-1321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Attachments: YARN-1321-20131029.txt, YARN-1321.patch, 
> YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, 
> YARN-1321.patch
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
> single JVM NMTokens for the same node from different AMs step on each other 
> and starting containers fail due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
> cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
> to start container. 
> NMToken for application attempt : appattempt_1382038445650_0002_01 was 
> used for starting container with container token issued for application 
> attempt : appattempt_1382038445650_0001_01
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-10-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809354#comment-13809354
 ] 

Xuan Gong commented on YARN-1279:
-

Will split the work into two parts. This ticket is used to track the work on RM 
side. It will include all the changes after RMNode receives the STATUS_UPDATE 
event, changes on NodeStatus and related PB changes.

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat

2013-10-30 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-1376:
---

 Summary: NM need to notify the log aggregation status to RM 
through Node heartbeat
 Key: YARN-1376
 URL: https://issues.apache.org/jira/browse/YARN-1376
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong


Expose a client API to allow clients to figure if log aggregation is complete. 
The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1279) Expose a client API to allow clients to figure if log aggregation is complete

2013-10-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809364#comment-13809364
 ] 

Xuan Gong commented on YARN-1279:
-

create YARN-1376 to track the changes on NM side

> Expose a client API to allow clients to figure if log aggregation is complete
> -
>
> Key: YARN-1279
> URL: https://issues.apache.org/jira/browse/YARN-1279
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
>
> Expose a client API to allow clients to figure if log aggregation is complete



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-10-30 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809382#comment-13809382
 ] 

Rohith Sharma K S commented on YARN-1366:
-

Hi Bikas,
 I have gone through your pdf file attached (YARN-556) and got understand about 
over all idea behind this subtask.
I have some doubts , please clariffy 

1. Resync means resetting the allocate RPC sequence number to 0 and the AM 
should send its entire outstanding request to the RM
>> I understood like, need to reset lastResponseID to 0 and should not clear 
>> ask , release , blacklistAdditions and blacklistRemovals. Is am I correct?

2. During RM restart , RM get new AMRMTokenSecretManager. At this time, there 
will be difference password. Is this handled from RM side during recovery for 
individual application? Otherwise impact is , heatbeat to restarted RM get fail 
with an authentication error "passoword does not match"


> ApplicationMasterService should Resync with the AM upon allocate call after 
> restart
> ---
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809416#comment-13809416
 ] 

Hadoop QA commented on YARN-1321:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/1264/YARN-1321.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2318//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2318//console

This message is automatically generated.

> NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to 
> work correctly
> 
>
> Key: YARN-1321
> URL: https://issues.apache.org/jira/browse/YARN-1321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Attachments: YARN-1321-20131029.txt, YARN-1321.patch, 
> YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, 
> YARN-1321.patch
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
> single JVM NMTokens for the same node from different AMs step on each other 
> and starting containers fail due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
> cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
> to start container. 
> NMToken for application attempt : appattempt_1382038445650_0002_01 was 
> used for starting container with container token issued for application 
> attempt : appattempt_1382038445650_0001_01
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1375) RM logs get filled with scheduler monitor logs when we enable scheduler monitoring

2013-10-30 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned YARN-1375:
--

Assignee: haosdent

> RM logs get filled with scheduler monitor logs when we enable scheduler 
> monitoring
> --
>
> Key: YARN-1375
> URL: https://issues.apache.org/jira/browse/YARN-1375
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: haosdent
>
> When we enable scheduler monitor, it is filling the RM logs with the same 
> queue states periodically. We can log only when any difference with the 
> previous state instead of logging the same message. 
> {code:xml}
> 2013-10-30 23:30:08,464 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156008464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:11,464 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156011464, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:14,465 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156014465, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:17,466 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156017466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:20,466 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156020466, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:23,467 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156023467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:26,468 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156026467, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:29,468 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156029468, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> 2013-10-30 23:30:32,469 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
>   QUEUESTATE: 1383156032469, a, 5120, 5, 508928, 497, 4096, 4, 5120, 5, 0, 0, 
> 0, 0, b, 3072, 3, 0, 0, 4096, 4, 3072, 3, 0, 0, 0, 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1318:
---

Attachment: yarn-1318-3.patch

New patch to fix the findbugs warning: use get/set methods to access haState.

> Promote AdminService to an Always-On service and merge in RMHAProtocolService
> -
>
> Key: YARN-1318
> URL: https://issues.apache.org/jira/browse/YARN-1318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>  Labels: ha
> Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
> yarn-1318-2.patch, yarn-1318-3.patch
>
>
> Per discussion in YARN-1068, we want AdminService to handle HA-admin 
> operations in addition to the regular non-HA admin operations. To facilitate 
> this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809565#comment-13809565
 ] 

Bikas Saha commented on YARN-1343:
--

It looks like in the reconnect with different capacity case we will end up 
sending 2 NODE_USABLE events for the same node.
{code}
}
rmNode.context.getRMNodes().put(newNode.getNodeID(), newNode);
rmNode.context.getDispatcher().getEventHandler().handle(
new RMNodeEvent(newNode.getNodeID(), RMNodeEventType.STARTED)); // 
<=== First instance when this triggers the ADD_NODE_Transition
  }
  rmNode.context.getDispatcher().getEventHandler().handle(
  new NodesListManagerEvent(
  NodesListManagerEventType.NODE_USABLE, rmNode)); // <=== Second 
instance
{code}

So we could probably move the second instance to the first if-stmt where it 
also sends the NodeAddedSchedulerEvent. That would handle the case of the same 
node coming back while the STARTED event in the else stmt will cover the case 
of a different node with the same node name coming back (same as a new node 
being added).
{code}
if (rmNode.getTotalCapability().equals(newNode.getTotalCapability())
  && rmNode.getHttpPort() == newNode.getHttpPort()) {
// Reset heartbeat ID since node just restarted.
rmNode.getLastNodeHeartBeatResponse().setResponseId(0);
if (rmNode.getState() != NodeState.UNHEALTHY) {
  // Only add new node if old state is not UNHEALTHY
  rmNode.context.getDispatcher().getEventHandler().handle(
  new NodeAddedSchedulerEvent(rmNode));
}
  }
{code}

I modified the patch testcase to try out reconnect with different capability 
and the above issue showed up.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1318) Promote AdminService to an Always-On service and merge in RMHAProtocolService

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809592#comment-13809592
 ] 

Hadoop QA commented on YARN-1318:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611154/yarn-1318-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2319//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2319//console

This message is automatically generated.

> Promote AdminService to an Always-On service and merge in RMHAProtocolService
> -
>
> Key: YARN-1318
> URL: https://issues.apache.org/jira/browse/YARN-1318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>  Labels: ha
> Attachments: yarn-1318-0.patch, yarn-1318-1.patch, yarn-1318-2.patch, 
> yarn-1318-2.patch, yarn-1318-3.patch
>
>
> Per discussion in YARN-1068, we want AdminService to handle HA-admin 
> operations in addition to the regular non-HA admin operations. To facilitate 
> this, we need to move AdminService an Always-On service. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1357) TestContainerLaunch.testContainerEnvVariables fails on Windows

2013-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1357:


  Component/s: nodemanager
 Target Version/s: 3.0.0, 2.2.1
Affects Version/s: 2.2.0
 Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this.

> TestContainerLaunch.testContainerEnvVariables fails on Windows
> --
>
> Key: YARN-1357
> URL: https://issues.apache.org/jira/browse/YARN-1357
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: YARN-1357.patch
>
>
> This test fails on Windows due to incorrect use of batch script command. 
> Error messages are as follows.
> {noformat}
> junit.framework.AssertionFailedError: expected: lim=19 cap=19]> but was:
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at junit.framework.Assert.assertEquals(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1359) AMRMToken should not be sent to Container other than AM.

2013-10-30 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809610#comment-13809610
 ] 

Omkar Vinit Joshi commented on YARN-1359:
-

Today node manager doesn't do this filtering of tokens. 

Proposal :-
Let node manager filter out AMRMToken from tokens while launching container 
other than AM. Thereby we are only (truly) allowing AM container to talk to RM 
on AMRM protocol.

Enhancements :- today node manager doesn't know which container is AM 
container. There are lot of problems because of this. So we first need a way to 
inform node manager about the container being AM. As today node manager comes 
to know everything about the new container from container token so it will be 
better to add isAM flag inside the token . Thoughts? 
(Note: we are anyway not encouraging users to talk to RM using multiple 
containers which are sharing same AMRMToken).


> AMRMToken should not be sent to Container other than AM.
> 
>
> Key: YARN-1359
> URL: https://issues.apache.org/jira/browse/YARN-1359
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1357) TestContainerLaunch.testContainerEnvVariables fails on Windows

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809615#comment-13809615
 ] 

Hudson commented on YARN-1357:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4675 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4675/])
YARN-1357. TestContainerLaunch.testContainerEnvVariables fails on Windows. 
Contributed by Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1537293)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java


> TestContainerLaunch.testContainerEnvVariables fails on Windows
> --
>
> Key: YARN-1357
> URL: https://issues.apache.org/jira/browse/YARN-1357
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Fix For: 3.0.0, 2.2.1
>
> Attachments: YARN-1357.patch
>
>
> This test fails on Windows due to incorrect use of batch script command. 
> Error messages are as follows.
> {noformat}
> junit.framework.AssertionFailedError: expected: lim=19 cap=19]> but was:
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at junit.framework.Assert.assertEquals(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level

2013-10-30 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-1377:
---

 Summary: Log aggregation via node manager should expose expose a 
way to cancel aggregation at application or container level
 Key: YARN-1377
 URL: https://issues.apache.org/jira/browse/YARN-1377
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi


Today when application finishes it starts aggregating all the logs but that may 
slow down the whole process significantly...
there can be situations where certain containers overwrote the logs .. say in 
multiple GBsin these scenarios we need a way to cancel log aggregation for 
certain containers. These can be at per application level or at per container 
level.
thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel aggregation at application or container level

2013-10-30 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1377:


Assignee: Xuan Gong

> Log aggregation via node manager should expose expose a way to cancel 
> aggregation at application or container level
> ---
>
> Key: YARN-1377
> URL: https://issues.apache.org/jira/browse/YARN-1377
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Xuan Gong
>
> Today when application finishes it starts aggregating all the logs but that 
> may slow down the whole process significantly...
> there can be situations where certain containers overwrote the logs .. say in 
> multiple GBsin these scenarios we need a way to cancel log aggregation 
> for certain containers. These can be at per application level or at per 
> container level.
> thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1377) Log aggregation via node manager should expose expose a way to cancel log aggregation at application or container level

2013-10-30 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1377:


Summary: Log aggregation via node manager should expose expose a way to 
cancel log aggregation at application or container level  (was: Log aggregation 
via node manager should expose expose a way to cancel aggregation at 
application or container level)

> Log aggregation via node manager should expose expose a way to cancel log 
> aggregation at application or container level
> ---
>
> Key: YARN-1377
> URL: https://issues.apache.org/jira/browse/YARN-1377
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Xuan Gong
>
> Today when application finishes it starts aggregating all the logs but that 
> may slow down the whole process significantly...
> there can be situations where certain containers overwrote the logs .. say in 
> multiple GBsin these scenarios we need a way to cancel log aggregation 
> for certain containers. These can be at per application level or at per 
> container level.
> thoughts?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1358) TestYarnCLI fails on Windows due to line endings

2013-10-30 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1358:


  Component/s: client
 Target Version/s: 3.0.0, 2.2.1
Affects Version/s: 2.2.0
 Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this.

> TestYarnCLI fails on Windows due to line endings
> 
>
> Key: YARN-1358
> URL: https://issues.apache.org/jira/browse/YARN-1358
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: client
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: YARN-1358.2.patch, YARN-1358.patch
>
>
> The unit test fails on Windows due to incorrect line endings was used for 
> comparing the output from command line output. Error messages are as follows.
> {noformat}
> junit.framework.ComparisonFailure: expected:<...argument for options[]
> usage: application
> ...> but was:<...argument for options[
> ]
> usage: application
> ...>
>   at junit.framework.Assert.assertEquals(Assert.java:85)
>   at junit.framework.Assert.assertEquals(Assert.java:91)
>   at 
> org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1358) TestYarnCLI fails on Windows due to line endings

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809651#comment-13809651
 ] 

Hudson commented on YARN-1358:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4676 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4676/])
YARN-1358. TestYarnCLI fails on Windows due to line endings. Contributed by 
Chuan Liu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1537305)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java


> TestYarnCLI fails on Windows due to line endings
> 
>
> Key: YARN-1358
> URL: https://issues.apache.org/jira/browse/YARN-1358
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: client
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Fix For: 3.0.0, 2.2.1
>
> Attachments: YARN-1358.2.patch, YARN-1358.patch
>
>
> The unit test fails on Windows due to incorrect line endings was used for 
> comparing the output from command line output. Error messages are as follows.
> {noformat}
> junit.framework.ComparisonFailure: expected:<...argument for options[]
> usage: application
> ...> but was:<...argument for options[
> ]
> usage: application
> ...>
>   at junit.framework.Assert.assertEquals(Assert.java:85)
>   at junit.framework.Assert.assertEquals(Assert.java:91)
>   at 
> org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1378) Implement a RMStateStore cleaner for deleting application/attempt info

2013-10-30 Thread Jian He (JIRA)
Jian He created YARN-1378:
-

 Summary: Implement a RMStateStore cleaner for deleting 
application/attempt info
 Key: YARN-1378
 URL: https://issues.apache.org/jira/browse/YARN-1378
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He


Now that we are storing the final state of application/attempt instead of 
removing application/attempt info on application/attempt completion(YARN-891), 
we need a separate RMStateStore cleaner for cleaning the application/attempt 
state.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1123:


Attachment: YARN-1123-3.patch

Adding the Patch with latest rebase and change Container status to Container 
state and add container exit status

Thanks,
Mayank

> [YARN-321] Adding ContainerReport and Protobuf implementation
> -
>
> Key: YARN-1123
> URL: https://issues.apache.org/jira/browse/YARN-1123
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
> Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch
>
>
> Like YARN-978, we need some client-oriented class to expose the container 
> history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809664#comment-13809664
 ] 

Karthik Kambatla commented on YARN-1374:


I see the issue. Will upload a patch shortly to add the SchedulingMonitor to 
RMActiveServices.

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla reassigned YARN-1374:
--

Assignee: Karthik Kambatla

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1379:
-

 Summary: [YARN-321] AHS protocols need to be in yarn proto package 
name after YARN-1170
 Key: YARN-1379
 URL: https://issues.apache.org/jira/browse/YARN-1379
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Found this while merging YARN-321 to the latest branch-2. Without this, 
compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1343:
-

Attachment: YARN-1343.patch

[~bikassaha], thanks for the review and catching the double dispatching. 
Uploading a patch with the changes you suggested and also adding a test to 
verify the NODE_USABLE event is dispatched when a reconnect happens and the 
node has different capabilities.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809676#comment-13809676
 ] 

Hadoop QA commented on YARN-1343:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611185/YARN-1343.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2321//console

This message is automatically generated.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809679#comment-13809679
 ] 

Hadoop QA commented on YARN-1123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611179/YARN-1123-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2320//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2320//console

This message is automatically generated.

> [YARN-321] Adding ContainerReport and Protobuf implementation
> -
>
> Key: YARN-1123
> URL: https://issues.apache.org/jira/browse/YARN-1123
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
> Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch
>
>
> Like YARN-978, we need some client-oriented class to expose the container 
> history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809678#comment-13809678
 ] 

Sandy Ryza commented on YARN-1290:
--

[~ywskycn], the current patch no longer applies.  Would you mind rebasing?

> Let continuous scheduling achieve more balanced task assignment
> ---
>
> Key: YARN-1290
> URL: https://issues.apache.org/jira/browse/YARN-1290
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
> YARN-1290.patch
>
>
> Currently, in continuous scheduling (YARN-1010), in each round, the thread 
> iterates over pre-ordered nodes and assigns tasks. This mechanism may 
> overload the first several nodes, while the latter nodes have no tasks.
> We should sort all nodes according to available resource. In each round, 
> always assign tasks to nodes with larger capacity, which can balance the load 
> distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809682#comment-13809682
 ] 

Wei Yan commented on YARN-1290:
---

[~sandyr]. I'll fix it.

> Let continuous scheduling achieve more balanced task assignment
> ---
>
> Key: YARN-1290
> URL: https://issues.apache.org/jira/browse/YARN-1290
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
> YARN-1290.patch
>
>
> Currently, in continuous scheduling (YARN-1010), in each round, the thread 
> iterates over pre-ordered nodes and assigns tasks. This mechanism may 
> overload the first several nodes, while the latter nodes have no tasks.
> We should sort all nodes according to available resource. In each round, 
> always assign tasks to nodes with larger capacity, which can balance the load 
> distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1379:
--

Attachment: YARN-1379.txt

Simple patch that adds package names.

Compilation passes after this.

> [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
> --
>
> Key: YARN-1379
> URL: https://issues.apache.org/jira/browse/YARN-1379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1379.txt
>
>
> Found this while merging YARN-321 to the latest branch-2. Without this, 
> compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809719#comment-13809719
 ] 

Hadoop QA commented on YARN-1343:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611185/YARN-1343.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2322//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2322//console

This message is automatically generated.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1321) NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809723#comment-13809723
 ] 

Hudson commented on YARN-1321:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4678 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4678/])
YARN-1321. Changed NMTokenCache to support both singleton and an instance 
usage. Contributed by Alejandro Abdelnur. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1537334)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/NMTokenCache.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java


> NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to 
> work correctly
> 
>
> Key: YARN-1321
> URL: https://issues.apache.org/jira/browse/YARN-1321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Blocker
> Fix For: 2.2.1
>
> Attachments: YARN-1321-20131029.txt, YARN-1321.patch, 
> YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, YARN-1321.patch, 
> YARN-1321.patch
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a 
> single JVM NMTokens for the same node from different AMs step on each other 
> and starting containers fail due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) 
> cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request 
> to start container. 
> NMToken for application attempt : appattempt_1382038445650_0002_01 was 
> used for starting container with container token issued for application 
> attempt : appattempt_1382038445650_0001_01
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809752#comment-13809752
 ] 

Zhijie Shen commented on YARN-1379:
---

+1. Verified it locally. The branch got compiled after the patch was applied.

> [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
> --
>
> Key: YARN-1379
> URL: https://issues.apache.org/jira/browse/YARN-1379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1379.txt
>
>
> Found this while merging YARN-321 to the latest branch-2. Without this, 
> compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1374:
---

Attachment: yarn-1374-1.patch

Here is a patch that moves creating the monitor policies to RMActiveServices.

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1374-1.patch
>
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809753#comment-13809753
 ] 

Bikas Saha commented on YARN-1343:
--

Can you please double check that testReconnectWithDifferentCapacity is actually 
resulting in reconnection? The test alters the existing node's capacity and 
thus I would expect the equality check in ReconnectTransition to consider the 
nodes same as before. We probably need to create a new node with same name and 
different capacity. Maybe stepping through in the debugger may show whats 
really happening.
If reconnect with different capability code is getting executed then I would 
expect mock rm context to have to mock getRMNodes() method and a listener to be 
added for RMNodeEvents. Or else the test will have exceptions in the output.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1374:
---

Attachment: yarn-1374-1.patch

Forgot to add license headers.

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1374-1.patch, yarn-1374-1.patch
>
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809755#comment-13809755
 ] 

Bikas Saha commented on YARN-1366:
--

Yes. the lastResponseId needs to reset to 0 and all the client side data like 
asks, blacklists etc should be sent in full to the RM.

The AMRMToken for the attempt is saved and restored. So the existing attempt 
will be able to reconnect to the restarted RM. This currently works.

> ApplicationMasterService should Resync with the AM upon allocate call after 
> restart
> ---
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809763#comment-13809763
 ] 

Mayank Bansal commented on YARN-1379:
-

+1 , verified.

Thanks,
Mayank

> [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
> --
>
> Key: YARN-1379
> URL: https://issues.apache.org/jira/browse/YARN-1379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1379.txt
>
>
> Found this while merging YARN-321 to the latest branch-2. Without this, 
> compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:


Attachment: YARN-311-v12.patch

In v12 patch, fix a tiny issue with adding volatile tag to ResourceOption in 
o.a.h.y.sls.nodemanager.NodeInfo (consumed by NMSimulator) after discussing 
with Luke offline.

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
> YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
> YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1379) [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1379.
---

   Resolution: Fixed
Fix Version/s: YARN-321
 Hadoop Flags: Reviewed

Tx for the quick verification! I just committed this to branch YARN-321.

> [YARN-321] AHS protocols need to be in yarn proto package name after YARN-1170
> --
>
> Key: YARN-1379
> URL: https://issues.apache.org/jira/browse/YARN-1379
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Fix For: YARN-321
>
> Attachments: YARN-1379.txt
>
>
> Found this while merging YARN-321 to the latest branch-2. Without this, 
> compilation fails.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809781#comment-13809781
 ] 

Hadoop QA commented on YARN-311:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611210/YARN-311-v12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2324//console

This message is automatically generated.

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
> YARN-311-v12.patch, YARN-311-v1.patch, YARN-311-v2.patch, YARN-311-v3.patch, 
> YARN-311-v4.patch, YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, 
> YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, 
> YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809782#comment-13809782
 ] 

Hadoop QA commented on YARN-1374:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611209/yarn-1374-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2323//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2323//console

This message is automatically generated.

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1374-1.patch, yarn-1374-1.patch
>
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1123:


Attachment: YARN-1123-4.patch

Adding toString optimization.

Thanks,
Mayank

> [YARN-321] Adding ContainerReport and Protobuf implementation
> -
>
> Key: YARN-1123
> URL: https://issues.apache.org/jira/browse/YARN-1123
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
> Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
> YARN-1123-4.patch
>
>
> Like YARN-978, we need some client-oriented class to expose the container 
> history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809793#comment-13809793
 ] 

Karthik Kambatla commented on YARN-1374:


The test fails without the fix and passes with it. 

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1374-1.patch, yarn-1374-1.patch
>
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated YARN-1343:
-

Attachment: YARN-1343.patch

TheTestRMNodeTransition tests only verify the expected follow up events for the 
{{NodeListManager}} are dispatched. 

To test that reconnect is happening with different capabilities we need to add 
a test for the {{ResourceTrackerService.registerNodeManager()}}.

Uploading a patch that tests a RECONNECTED event dispatching with same and 
different capabilities.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
> YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-10-30 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-978:
---

Attachment: YARN-978.9.patch

After YARN-947 Had to make changes and remove stuff from this patch.
Added toString optimization

Thanks,
Mayank

> [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
> --
>
> Key: YARN-978
> URL: https://issues.apache.org/jira/browse/YARN-978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: YARN-321
>
> Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
> YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
> YARN-978.8.patch, YARN-978.9.patch
>
>
> We dont have ApplicationAttemptReport and Protobuf implementation.
> Adding that.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-311:


Attachment: YARN-311-v12b.patch

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
> YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, 
> YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, 
> YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, 
> YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809797#comment-13809797
 ] 

Junping Du commented on YARN-311:
-

The log didn't show it is an build failure (it works well locally), so the 
jenkins failure above is not related with patch but an accident. Rename it to 
v12b (exactly the same) patch and submit it again.

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
> YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, 
> YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, 
> YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, 
> YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1320:
--

Summary: Custom log4j properties in Distributed shell does not work 
properly.  (was: Custom log4j properties does not work properly.)

> Custom log4j properties in Distributed shell does not work properly.
> 
>
> Key: YARN-1320
> URL: https://issues.apache.org/jira/browse/YARN-1320
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch
>
>
> Distributed shell cannot pick up custom log4j properties (specified with 
> -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-978) [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809801#comment-13809801
 ] 

Hadoop QA commented on YARN-978:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611217/YARN-978.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2326//console

This message is automatically generated.

> [YARN-321] Adding ApplicationAttemptReport and Protobuf implementation
> --
>
> Key: YARN-978
> URL: https://issues.apache.org/jira/browse/YARN-978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: YARN-321
>
> Attachments: YARN-978-1.patch, YARN-978.2.patch, YARN-978.3.patch, 
> YARN-978.4.patch, YARN-978.5.patch, YARN-978.6.patch, YARN-978.7.patch, 
> YARN-978.8.patch, YARN-978.9.patch
>
>
> We dont have ApplicationAttemptReport and Protobuf implementation.
> Adding that.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1123) [YARN-321] Adding ContainerReport and Protobuf implementation

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809803#comment-13809803
 ] 

Hadoop QA commented on YARN-1123:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611214/YARN-1123-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2327//console

This message is automatically generated.

> [YARN-321] Adding ContainerReport and Protobuf implementation
> -
>
> Key: YARN-1123
> URL: https://issues.apache.org/jira/browse/YARN-1123
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
> Attachments: YARN-1123-1.patch, YARN-1123-2.patch, YARN-1123-3.patch, 
> YARN-1123-4.patch
>
>
> Like YARN-978, we need some client-oriented class to expose the container 
> history info. Neither Container nor RMContainer is the right one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1320) Custom log4j properties in Distributed shell does not work properly.

2013-10-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809808#comment-13809808
 ] 

Vinod Kumar Vavilapalli commented on YARN-1320:
---

Patch looks good to me. Can you update what tests you've done?

Also may be we can write a test? By making use of log-aggregation :)

> Custom log4j properties in Distributed shell does not work properly.
> 
>
> Key: YARN-1320
> URL: https://issues.apache.org/jira/browse/YARN-1320
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
>Reporter: Tassapol Athiapinya
>Assignee: Xuan Gong
> Fix For: 2.2.1
>
> Attachments: YARN-1320.1.patch, YARN-1320.2.patch, YARN-1320.3.patch
>
>
> Distributed shell cannot pick up custom log4j properties (specified with 
> -log_properties). It always uses default log4j properties.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809814#comment-13809814
 ] 

Bikas Saha commented on YARN-1343:
--

lgtm.
In the new testReconnect() we should check that the number of RMNode in 
rmContext.getRMNodes() is still 1. eg. the second node actually replaced the 
previous node (desired behavior) as opposed to both getting into the list.

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
> YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809815#comment-13809815
 ] 

Bikas Saha commented on YARN-1343:
--

+1 for committing. Thanks!

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
> YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-311) Dynamic node resource configuration: core scheduler changes

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809820#comment-13809820
 ] 

Hadoop QA commented on YARN-311:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611218/YARN-311-v12b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2325//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2325//console

This message is automatically generated.

> Dynamic node resource configuration: core scheduler changes
> ---
>
> Key: YARN-311
> URL: https://issues.apache.org/jira/browse/YARN-311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-311-v10.patch, YARN-311-v11.patch, 
> YARN-311-v12b.patch, YARN-311-v12.patch, YARN-311-v1.patch, 
> YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, YARN-311-v4.patch, 
> YARN-311-v5.patch, YARN-311-v6.1.patch, YARN-311-v6.2.patch, 
> YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, YARN-311-v9.patch
>
>
> As the first step, we go for resource change on RM side and expose admin APIs 
> (admin protocol, CLI, REST and JMX API) later. In this jira, we will only 
> contain changes in scheduler. 
> The flow to update node's resource and awareness in resource scheduling is: 
> 1. Resource update is through admin API to RM and take effect on RMNodeImpl.
> 2. When next NM heartbeat for updating status comes, the RMNode's resource 
> change will be aware and the delta resource is added to schedulerNode's 
> availableResource before actual scheduling happens.
> 3. Scheduler do resource allocation according to new availableResource in 
> SchedulerNode.
> For more design details, please refer proposal and discussions in parent 
> JIRA: YARN-291.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809821#comment-13809821
 ] 

Alejandro Abdelnur commented on YARN-1343:
--

[~bikassaha], on your ask about checking for the node count, I don't think is 
necessary, if a reconnect is triggered, it means it was found, in the 
{{ResourceTrackerService.registerNodeManager()}}:

{code}
  RMNode oldNode = this.rmContext.getRMNodes().putIfAbsent(nodeId, rmNode);
if (oldNode == null) {
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMNodeEvent(nodeId, RMNodeEventType.STARTED));
} else {
  LOG.info("Reconnect from the node at: " + host);
  this.nmLivelinessMonitor.unregister(nodeId);
  this.rmContext.getDispatcher().getEventHandler().handle(
  new RMNodeReconnectEvent(nodeId, rmNode));
}
{code}

thx

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
> YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.8.patch

Thanks Vinod/Bikas for the reviews.

- New patch fixed the above comments.
- Made a new change in RMAppManager.recover() that recovers the application 
synchronously , as otherwise client is possible to see the application not 
recovered because ClientRMService is already started at that time.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
> YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
> YARN-891.7.patch, YARN-891.8.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-1290:
--

Attachment: YARN-1290.patch

> Let continuous scheduling achieve more balanced task assignment
> ---
>
> Key: YARN-1290
> URL: https://issues.apache.org/jira/browse/YARN-1290
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
> YARN-1290.patch, YARN-1290.patch
>
>
> Currently, in continuous scheduling (YARN-1010), in each round, the thread 
> iterates over pre-ordered nodes and assigns tasks. This mechanism may 
> overload the first several nodes, while the latter nodes have no tasks.
> We should sort all nodes according to available resource. In each round, 
> always assign tasks to nodes with larger capacity, which can balance the load 
> distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809840#comment-13809840
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611222/YARN-891.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2328//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2328//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
> YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
> YARN-891.7.patch, YARN-891.8.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.9.patch

Fixed the test failure

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
> YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
> YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809849#comment-13809849
 ] 

Hadoop QA commented on YARN-891:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611232/YARN-891.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2330//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.1.patch, YARN-891.2.patch, YARN-891.3.patch, 
> YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, YARN-891.7.patch, 
> YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1290) Let continuous scheduling achieve more balanced task assignment

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809851#comment-13809851
 ] 

Hadoop QA commented on YARN-1290:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611228/YARN-1290.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2329//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2329//console

This message is automatically generated.

> Let continuous scheduling achieve more balanced task assignment
> ---
>
> Key: YARN-1290
> URL: https://issues.apache.org/jira/browse/YARN-1290
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: main.pdf, YARN-1290.patch, YARN-1290.patch, 
> YARN-1290.patch, YARN-1290.patch
>
>
> Currently, in continuous scheduling (YARN-1010), in each round, the thread 
> iterates over pre-ordered nodes and assigns tasks. This mechanism may 
> overload the first several nodes, while the latter nodes have no tasks.
> We should sort all nodes according to available resource. In each round, 
> always assign tasks to nodes with larger capacity, which can balance the load 
> distribution among all nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-998) Persistent resource change during NM restart

2013-10-30 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reassigned YARN-998:
---

Assignee: Junping Du

> Persistent resource change during NM restart
> 
>
> Key: YARN-998
> URL: https://issues.apache.org/jira/browse/YARN-998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Junping Du
>
> When NM is restarted by plan or from a failure, previous dynamic resource 
> setting should be kept for consistency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1374) Resource Manager fails to start due to ConcurrentModificationException

2013-10-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809874#comment-13809874
 ] 

Chris Douglas commented on YARN-1374:
-

+1 lgtm

> Resource Manager fails to start due to ConcurrentModificationException
> --
>
> Key: YARN-1374
> URL: https://issues.apache.org/jira/browse/YARN-1374
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Devaraj K
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-1374-1.patch, yarn-1374-1.patch
>
>
> Resource Manager is failing to start with the below 
> ConcurrentModificationException.
> {code:xml}
> 2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: 
> Refreshing hosts (include/exclude) list
> 2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: 
> Service ResourceManager failed in state INITED; cause: 
> java.util.ConcurrentModificationException
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioning to standby
> 2013-10-30 20:22:42,378 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: 
> Transitioned to standby
> 2013-10-30 20:22:42,378 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
> ResourceManager
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
> 2013-10-30 20:22:42,379 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/10.18.40.24
> /
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved YARN-1343.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
> YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1343) NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs

2013-10-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809883#comment-13809883
 ] 

Hudson commented on YARN-1343:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4680 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4680/])
YARN-1343. NodeManagers additions/restarts are not reported as node updates in 
AllocateResponse responses to AMs. (tucu) (tucu: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1537368)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/NodesListManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/resourcetracker/TestNMReconnect.java


> NodeManagers additions/restarts are not reported as node updates in 
> AllocateResponse responses to AMs
> -
>
> Key: YARN-1343
> URL: https://issues.apache.org/jira/browse/YARN-1343
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 2.2.1
>
> Attachments: YARN-1343.patch, YARN-1343.patch, YARN-1343.patch, 
> YARN-1343.patch
>
>
> If a NodeManager joins the cluster or gets restarted, running AMs never 
> receive the node update indicating the Node is running.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1324) NodeManager potentially causes unnecessary operations on all its disks

2013-10-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809885#comment-13809885
 ] 

Chris Douglas commented on YARN-1324:
-

bq. When does MR use multiple disks in the same task/container? Isnt the map 
output written to a single indexed partition file?

Spills are spread across all volumes, but merged into a single file at the end.

Would randomizing the order of disks be a reasonable short-term workaround for 
(1)? Future changes could weight/elide directories based on other criteria, but 
that's a simple change. So would changing the "random" selection to bias its 
search order using a hash of the task id (instead of disk usage when creating 
the spill), so the ShuffleHandler could search fewer directories on average. I 
agree with Vinod, it would be hard to prevent the search altogether...

bq. Requiring apps to specify the number of disks for a container is also a 
viable solution and can be done in a back-compatible manner by changing MR to 
specify multiple disks and leaving the default to 1 for apps that dont care.

This makes sense as a hint, but some users might interpret it as a constraint 
and be confused when a NM schedules them on a node the reports fewer local dirs 
(due to failure, heterogeneous config).

> NodeManager potentially causes unnecessary operations on all its disks
> --
>
> Key: YARN-1324
> URL: https://issues.apache.org/jira/browse/YARN-1324
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Bikas Saha
>
> Currently, for every container, the NM creates a directory on every disk and 
> expects the container-task to choose 1 of them and load balance the use of 
> the disks across all containers. 
> 1) This may have worked fine in the MR world where MR tasks would randomly 
> choose dirs but in general we cannot expect every app/task writer to 
> understand these nuances and randomly pick disks. So we could end up 
> overloading the first disk if most people decide to use the first disk.
> 2) This makes a number of NM operations to scan every disk (thus randomizing 
> that disk) to locate the dir which the task has actually chosen to use for 
> its files. Makes all these operations expensive for the NM as well as 
> disruptive for users of disks that did not have the real task working dirs.
> I propose that NM should up-front decide the disk it is assigning to tasks. 
> It could choose to do so randomly or weighted-randomly by looking at space 
> and load on each disk. So it could do a better job of load balancing. Then, 
> it would associate the chosen working directory with the container context so 
> that subsequent operations on the NM can directly seek to the correct 
> location instead of having to seek on every disk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-891:
-

Attachment: YARN-891.10.patch

resubmit the patch

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.10.patch, YARN-891.1.patch, YARN-891.2.patch, 
> YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, 
> YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-891) Store completed application information in RM state store

2013-10-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809934#comment-13809934
 ] 

Hadoop QA commented on YARN-891:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611248/YARN-891.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2331//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2331//console

This message is automatically generated.

> Store completed application information in RM state store
> -
>
> Key: YARN-891
> URL: https://issues.apache.org/jira/browse/YARN-891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Jian He
> Attachments: YARN-891.10.patch, YARN-891.1.patch, YARN-891.2.patch, 
> YARN-891.3.patch, YARN-891.4.patch, YARN-891.5.patch, YARN-891.6.patch, 
> YARN-891.7.patch, YARN-891.7.patch, YARN-891.8.patch, YARN-891.9.patch, 
> YARN-891.patch, YARN-891.patch, YARN-891.patch, YARN-891.patch, 
> YARN-891.patch, YARN-891.patch
>
>
> Store completed application/attempt info in RMStateStore when 
> application/attempt completes. This solves some problems like finished 
> application get lost after RM restart and some other races like YARN-1195



--
This message was sent by Atlassian JIRA
(v6.1#6144)