[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629837#comment-13629837
 ] 

Xuan Gong commented on YARN-561:


org.apache.hadoop.yarn.api.records.container has containerId and NodeId(which 
can get address and port) which are enough for container talked to its local 
NM. And by YARN-486, we have already add 
org.apache.hadoop.yarn.api.records.container to ContainImpl. So, it will get 
those information now.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629839#comment-13629839
 ] 

Hadoop QA commented on YARN-441:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578368/YARN-441.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/725//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/725//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/725//console

This message is automatically generated.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs

2013-04-12 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-441:
---

Attachment: YARN-441.5.patch

1. fix -1 on javadoc
2. fix -1 on findbug by remove the keyword synchronized from 
StartContainerResponsePbImpl.java

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch, YARN-441.5.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629880#comment-13629880
 ] 

Hadoop QA commented on YARN-570:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577997/MAPREDUCE-5141.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/726//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/726//console

This message is automatically generated.

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: PengZhang
 Attachments: MAPREDUCE-5141.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629887#comment-13629887
 ] 

Hadoop QA commented on YARN-441:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578389/YARN-441.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/727//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/727//console

This message is automatically generated.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch, YARN-441.5.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2013-04-12 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629907#comment-13629907
 ] 

Harsh J commented on YARN-570:
--

Thanks for the report and the patch!

With this patch it now renders it this way:

renderHadoopDate() - Wed, 10 Apr 2013 08:29:56 GMT+05:30
format() - 10-Apr-2013 08:29:56

Which I think is still inconsistent. Ideally, I think, we'd want the former 
everywhere for consistency. Can you update format() as well to print in the 
same style, if you agree?

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: PengZhang
 Attachments: MAPREDUCE-5141.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-12 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-457:
-

Attachment: YARN-457-4.patch

Attached a patch for trunk. Thank you.

 Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
 

 Key: YARN-457
 URL: https://issues.apache.org/jira/browse/YARN-457
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
Priority: Minor
  Labels: Newbie
 Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457-4.patch, 
 YARN-457.patch


 {code}
 if (updatedNodes == null) {
   this.updatedNodes.clear();
   return;
 }
 {code}
 If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629987#comment-13629987
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-Yarn-trunk #181 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/181/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629991#comment-13629991
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-Yarn-trunk #181 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/181/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630008#comment-13630008
 ] 

Hudson commented on YARN-412:
-

Integrated in Hadoop-trunk-Commit #3607 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3607/])
YARN-412. Fixed FifoScheduler to check hostname of a NodeManager rather 
than its host:port during scheduling which caused incorrect locality for 
containers. Contributed by Roger Hoover. (Revision 1467244)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467244
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630040#comment-13630040
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-Hdfs-trunk #1370 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1370/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630044#comment-13630044
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-Hdfs-trunk #1370 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1370/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2013-04-12 Thread Timothy St. Clair (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630059#comment-13630059
 ] 

Timothy St. Clair commented on YARN-160:


If it's possible to tag along development on this one, I would be interested in 
the approach.  IMHO referencing existing solutions gauges baseline: 

Ref:
http://www.open-mpi.org/projects/hwloc/
http://www.rce-cast.com/Podcast/rce-33-hwloc-portable-hardware-locality.html
http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.5-beta


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630075#comment-13630075
 ] 

Hudson commented on YARN-412:
-

Integrated in Hadoop-Mapreduce-trunk #1397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1397/])
YARN-412. Fixed FifoScheduler to check hostname of a NodeManager rather 
than its host:port during scheduling which caused incorrect locality for 
containers. Contributed by Roger Hoover. (Revision 1467244)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467244
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Fix For: 2.0.4-alpha

 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630079#comment-13630079
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-Mapreduce-trunk #1397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1397/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630083#comment-13630083
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-Mapreduce-trunk #1397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1397/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630320#comment-13630320
 ] 

Bikas Saha commented on YARN-514:
-

you only need to add the new field in the enum. I dont think we should change 
the values of all existing enums.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-571) User should not be part of ContainerLaunchContext

2013-04-12 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-571:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-386

 User should not be part of ContainerLaunchContext
 -

 Key: YARN-571
 URL: https://issues.apache.org/jira/browse/YARN-571
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah

 Today, a user is expected to set the user name in the CLC when either 
 submitting an application or launching a container from the AM. This does not 
 make sense as the user can/has been identified by the RM as part of the RPC 
 layer.
 Solution would be to move the user information into either the Container 
 object or directly into the ContainerToken which can then be used by the NM 
 to launch the container. This user information would set into the container 
 by the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-572) Remove duplication of data in Container

2013-04-12 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-572:


 Summary: Remove duplication of data in Container 
 Key: YARN-572
 URL: https://issues.apache.org/jira/browse/YARN-572
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah


Most of the information needed to launch a container is duplicated in both the 
Container class as well as in the ContainerToken object that the Container 
object already contains. It would be good to remove this level of duplication. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630425#comment-13630425
 ] 

Xuan Gong commented on YARN-457:


+1, Looks good

 Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
 

 Key: YARN-457
 URL: https://issues.apache.org/jira/browse/YARN-457
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
Priority: Minor
  Labels: Newbie
 Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457-4.patch, 
 YARN-457.patch


 {code}
 if (updatedNodes == null) {
   this.updatedNodes.clear();
   return;
 }
 {code}
 If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-513) Verify all clients will wait for RM to restart

2013-04-12 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-513:
--

Assignee: Xuan Gong  (was: Jian He)

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-04-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630440#comment-13630440
 ] 

Sandy Ryza commented on YARN-392:
-

Any further thoughts on this?

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392-1.patch, YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530.4.patch

no tangible change than previously; publishing to keep in sync with the updated 
YARN-117 everything patch

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-117) Enhance YARN service model

2013-04-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117.4.patch

The changes here since the last patch related to the test 
{{TestNodeStatusUpdater}} which was failing on Jenkins but not locally.

#adding timeouts in the {{syncBarrier.await()}} clause handle better the 
situation where the rollback of a failing {[start()}} doesn't block -as the 
barrier in the test case isn't reached as it would be on the same thread.

#lots of extra assertions and debugging to see why {{testNMConnectionToRM()}} 
fails most of the time on a Linux test VM. It looks like the time-based 
assertions are brittle there (not fixed)

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 

[jira] [Commented] (YARN-117) Enhance YARN service model

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630478#comment-13630478
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578484/YARN-117.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/729//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this 

[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630489#comment-13630489
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578482/YARN-530.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/728//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/728//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/728//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630493#comment-13630493
 ] 

Vinod Kumar Vavilapalli commented on YARN-561:
--

Xuan, what Hitesh is saying is that when a container starts as a process, it 
doesn't know its containerId. We should make NM export it as part of the env.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-571) User should not be part of ContainerLaunchContext

2013-04-12 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-571:


Assignee: Vinod Kumar Vavilapalli

Taking a shot at this..

 User should not be part of ContainerLaunchContext
 -

 Key: YARN-571
 URL: https://issues.apache.org/jira/browse/YARN-571
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Vinod Kumar Vavilapalli

 Today, a user is expected to set the user name in the CLC when either 
 submitting an application or launching a container from the AM. This does not 
 make sense as the user can/has been identified by the RM as part of the RPC 
 layer.
 Solution would be to move the user information into either the Container 
 object or directly into the ContainerToken which can then be used by the NM 
 to launch the container. This user information would set into the container 
 by the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-572) Remove duplication of data in Container

2013-04-12 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned YARN-572:


Assignee: Hitesh Shah

 Remove duplication of data in Container 
 

 Key: YARN-572
 URL: https://issues.apache.org/jira/browse/YARN-572
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Hitesh Shah

 Most of the information needed to launch a container is duplicated in both 
 the Container class as well as in the ContainerToken object that the 
 Container object already contains. It would be good to remove this level of 
 duplication. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (YARN-435) Make it easier to access cluster topology information in an AM

2013-04-12 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned YARN-435:


Assignee: Hitesh Shah

 Make it easier to access cluster topology information in an AM
 --

 Key: YARN-435
 URL: https://issues.apache.org/jira/browse/YARN-435
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Hitesh Shah

 ClientRMProtocol exposes a getClusterNodes api that provides a report on all 
 nodes in the cluster including their rack information. 
 However, this requires the AM to open and establish a separate connection to 
 the RM in addition to one for the AMRMProtocol. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-04-12 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630619#comment-13630619
 ] 

Daryn Sharp commented on YARN-503:
--

bq. There's likely a race between the RenewalTask and AbortTask.  [...] I think 
it's possible for a schedulesTask to be executing - in which case the 
abortScheduleTask() may have no affect and can result in the wrong task being 
cancelled / scheduled.
It's ok if another task is executing, because it's just trying to abort any 
pending task.  Since there's only one possible pending task per token at any 
given time, the wrong task can't be cancelled.  Did I miss an edge case?

bq. ManagedApp.add - instead of adding to the app and the token here, this 
composite add can be kept outside of ManagedApp/ManagedToken
Not sure I follow.  Are you suggesting to move {{managedToken.add(appId)}} into 
the loop in {{addApplication}}?  I was trying to encapsulate the implementation 
details of adding/removing the appId within ManagedApp.  Is it ok to leave it 
as-is?

bq. ManagedApp.expunge() - is synchronization on 'appTokens' required ?
Strictly speaking, probably not.  It's a throwback to earlier implementation 
that was doing trickier stuff.  It was to avoid concurrent modification 
exceptions while iterating, but appTokens isn't mutating in multiple threads.  
And the {{remove}} is essentially guarding it too.  For that matter, I don't 
think {{appTokens}} needs to be a synch-ed set.  I'll change it.

bq. MangedToken.expunge() - tokenApps.clear() required ?
Probably not.  Seemed like good housekeeping, but I'll remove it.

bq. In the unit test - the 1 second sleep seems rather low. Instead of the 
sleep, this can be changed to a timed wait on one of the fields being verified.
I don't like sleeps either.  1s is an eternity in this case because the initial 
renew and cancel timer tasks fire immediately on mocked objects, so it should 
run in a few ms.  I assume you are suggesting using notify in a mock'ed answer 
method?  Multiple timers are expected to fire in some cases, so it would 
probably require something like a CountdownLatch, which will get tricky to keep 
swapping in a new one by re-adding mocked responses with the new latch.  Let me 
know if you feel it's worth it to change it.



 DelegationTokens will be renewed forever if multiple jobs share tokens and 
 the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
 --

 Key: YARN-503
 URL: https://issues.apache.org/jira/browse/YARN-503
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Daryn Sharp
 Attachments: YARN-503.patch, YARN-503.patch


 The first Job/App to register a token is the one which DelegationTokenRenewer 
 associates with a a specific Token. An attempt to remove/cancel these shared 
 tokens by subsequent jobs doesn't work - since the JobId will not match.
 As a result, Even if subsequent jobs have 
 MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be 
 cancelled when those jobs complete.
 Tokens will eventually be removed from the RM / JT when the service that 
 issued them considers them to have expired or via an explicit 
 cancelDelegationTokens call (not implemented yet in 23).
 A side affect of this is that the same delegation token will end up being 
 renewed multiple times (a separate TimerTask for each job which uses the 
 token).
 DelegationTokenRenewer could maintain a reference count/list of jobIds for 
 shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630626#comment-13630626
 ] 

Xuan Gong commented on YARN-561:


when container starts as a process, it does not know its containerId. Does it 
mean when we execute the script to launch the container, the script does not 
include this containerId ? 
If I understand it correctly, we can solve this issue like this:
1. We need to add some content into enum Environment, such as 
ContainerId(String)(Which can be converted back by using 
ConverterUtils.toContainerId(String containerId)), NM hostName(String), 
NMPort(int).
2. The container Launch script is write out at ContainerLaunch::call(), and the 
environment is also set here. At ContainerLaunch, we already have 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container, so 
containerId can be simply get. The NM hostName and NMPort can be get from 
NM_NodeId which is in NMContext. And ContainerLaunch is initialized from 
ContainerLauncher which already has NMContext. So, we can make changes here, 
when we initialize the ContainerLaunch, we either input NMContext as parameter, 
or simply give NM_NodeId, or just give NM_hostName and NMPort, then we can get 
all the information we need. 
Any other suggestions ??

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630635#comment-13630635
 ] 

Hitesh Shah commented on YARN-561:
--

@Xuan, one thing to be careful of this certain env settings should only be set 
by the NodeManager when it launches the container and not by an application. So 
you would need a notion of certain whitelist environment variables that should 
be set only by the NM and not overridden by the env in CLC provided by the 
application. 

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-12 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-412:
---

Fix Version/s: (was: 2.0.4-alpha)
   2.0.5-beta

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Fix For: 2.0.5-beta

 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630649#comment-13630649
 ] 

Hudson commented on YARN-412:
-

Integrated in Hadoop-trunk-Commit #3610 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3610/])
YARN-412. Pushing to 2.0.5-beta only. (Revision 1467470)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467470
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Fix For: 2.0.5-beta

 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630652#comment-13630652
 ] 

Xuan Gong commented on YARN-561:


Just like ApplicationConstants which include some variables can only set in the 
AppMaster environment ? At the beginning (From the code 
ContainerLaunch::call()), the env is original from CLC.getEnvironment(), then 
we can set ContainerId, and Node_hostName, Node_portNumber after that.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-562) NM should reject containers allocated by previous RM

2013-04-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-562:
-

Attachment: YARN-562.1.patch

This patch does:
1. add RM's cluster timestamp in NM to reject old containers.
2. Block container requests while NM is resyncing with RM.
3. Add test cases for both cases


 NM should reject containers allocated by previous RM
 

 Key: YARN-562
 URL: https://issues.apache.org/jira/browse/YARN-562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-562.1.patch


 Its possible that after RM shutdown, before AM goes down,AM still call 
 startContainer on NM with containers allocated by previous RM. When RM comes 
 back, NM doesn't know whether this container launch request comes from 
 previous RM or the current RM. we should reject containers allocated by 
 previous RM 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart

2013-04-12 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630672#comment-13630672
 ] 

Xuan Gong commented on YARN-513:


From ApplicationMaster perspective: 
1. The very first communication it will have with the RM is for Register itself 
with RM which is from AMRMClientImpl::registerApplicationMaster(), so we can 
add waitting logic here, to try several times until it is accepted or throw out 
the exceptions

From Client Perspective: 
1. The very first communication it will have with the RM is 
getNewApplication(), which is in YarnClientImpl::getNewApplication(request), we 
can add waitting logic here.

In order to do that, we need add several const and variables to 
YarnConfiguration, such as AM_RM_CONNECTION_RETRY_INTERVAL_SECS, 
AM_RM_CONNECT_WAIT_SECS, CLIENT_RM_CONNECTION_RETRY_INTERVAL_SECS and 
CLIENT_RM_CONNECTION_WAIT_SECS.

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart

2013-04-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630677#comment-13630677
 ] 

Bikas Saha commented on YARN-513:
-

What about other interactions with the RM such as allocate() or 
finishApplicationMaster()

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-562) NM should reject containers allocated by previous RM

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630678#comment-13630678
 ] 

Hadoop QA commented on YARN-562:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578512/YARN-562.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/730//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/730//console

This message is automatically generated.

 NM should reject containers allocated by previous RM
 

 Key: YARN-562
 URL: https://issues.apache.org/jira/browse/YARN-562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-562.1.patch


 Its possible that after RM shutdown, before AM goes down,AM still call 
 startContainer on NM with containers allocated by previous RM. When RM comes 
 back, NM doesn't know whether this container launch request comes from 
 previous RM or the current RM. we should reject containers allocated by 
 previous RM 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630681#comment-13630681
 ] 

Hitesh Shah commented on YARN-561:
--

Take a look at ContainerLaunch#sanitizeEnv() and how it handles non-modifiable 
environment variables. The above mentioned env variables should also fall into 
the non-modifiable category.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630734#comment-13630734
 ] 

Sandy Ryza commented on YARN-45:


Carlo,
I'm glad that this is being proposed.  Have you considered including how long 
the grace period is in the response?

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-12 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630741#comment-13630741
 ] 

Omkar Vinit Joshi commented on YARN-547:


Canceling the patch as it was fixing the existing problems but was removing 
parallelization (based on number of containers not resource). Making sure this 
parallelization still exists.
* Removing Invalid transitions for INIT and LOCALIZED however not modifying 
DOWNLOADING state transition.
* Making sure that now in PublicLocalizer as well we acquire lock before 
downloading. This will fix broken signaling. Now multiple containers will still 
try to download but download will start/enqueued only if
** we can acquire lock on LocalizedResource.
** LocalizedResource is still in DOWNLOADING state.



 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-04-12 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630749#comment-13630749
 ] 

Siddharth Seth commented on YARN-503:
-

bq. I don't like sleeps either. 1s is an eternity in this case because the 
initial renew and cancel timer tasks fire immediately on mocked objects, so it 
should run in a few ms. I assume you are suggesting using notify in a mock'ed 
answer method? Multiple timers are expected to fire in some cases, so it would 
probably require something like a CountdownLatch, which will get tricky to keep 
swapping in a new one by re-adding mocked responses with the new latch. Let me 
know if you feel it's worth it to change it.
Was actually suggesting doing the post-sleep verify in a check-sleep loop, 
instead of just sleeping. Passing this step indicates the required execution 
has completed. Would prefer keeping a sleep out of the tests if we can. 
Otherwise a longer sleep for sure.

bq. It's ok if another task is executing, because it's just trying to abort any 
pending task. Since there's only one possible pending task per token at any 
given time, the wrong task can't be cancelled. Did I miss an edge case?
I think there's an edge case. Sequence
1. [t1] timerTask is a RenewalTask
2. [t1] timer kicks in and starts executing
3. [t2] scheduleCancelled gets called in a parallel thread [via AppRemovalTask]
4. [t2] scheduleCancelled.abortScheduled called - synchronized but does nothing 
useful since the current task is already running.
5. [t2] scheduleCancelled runs to completion and creates a cancelTask
6. [t1] completes execution - and calls scheduleTask(new TokenRenewTask(), 
renewIn) - which effectively destorys the scheduled cancelTask

bq. Are you suggesting to move managedToken.add(appId) into the loop in 
addApplication? I was trying to encapsulate the implementation details of 
adding/removing the appId within ManagedApp. Is it ok to leave it as-is?
I thought it'd be cleaner leaving this outside of ManagedApp - ManagedApp 
should not be managing ManagedTokens. IAC, don't feel strongly about this; 
whatever you decide.

 DelegationTokens will be renewed forever if multiple jobs share tokens and 
 the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
 --

 Key: YARN-503
 URL: https://issues.apache.org/jira/browse/YARN-503
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Daryn Sharp
 Attachments: YARN-503.patch, YARN-503.patch


 The first Job/App to register a token is the one which DelegationTokenRenewer 
 associates with a a specific Token. An attempt to remove/cancel these shared 
 tokens by subsequent jobs doesn't work - since the JobId will not match.
 As a result, Even if subsequent jobs have 
 MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be 
 cancelled when those jobs complete.
 Tokens will eventually be removed from the RM / JT when the service that 
 issued them considers them to have expired or via an explicit 
 cancelDelegationTokens call (not implemented yet in 23).
 A side affect of this is that the same delegation token will end up being 
 renewed multiple times (a separate TimerTask for each job which uses the 
 token).
 DelegationTokenRenewer could maintain a reference count/list of jobIds for 
 shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-12 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130412.patch

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch, 
 yarn-547-20130412.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-476) ProcfsBasedProcessTree info message confuses users

2013-04-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-476:


Attachment: YARN-476.patch

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-476) ProcfsBasedProcessTree info message confuses users

2013-04-12 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630759#comment-13630759
 ] 

Sandy Ryza commented on YARN-476:
-

Attached patch that removes the log statement entirely.

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-476) ProcfsBasedProcessTree info message confuses users

2013-04-12 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-476:


Priority: Minor  (was: Major)

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe
Priority: Minor
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630766#comment-13630766
 ] 

Hadoop QA commented on YARN-547:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578544/yarn-547-20130412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalizedResource

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/731//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/731//console

This message is automatically generated.

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch, 
 yarn-547-20130412.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-12 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630769#comment-13630769
 ] 

Bikas Saha commented on YARN-45:


I like the idea of the RM giving information to the AM about actions that it 
might take which will affect the AM. However, I am wary of having the action 
taken in different places. eg. the KILL to the containers should come from the 
RM or the AM exclusively but not from both. Otherwise we open ourselves up to 
race conditions, unnecessary kills and complex logic in the RM.

Preemption is something that, IMO the RM needs to do at the very last moment 
when there is no other alternative of resource being freed up. If we decide to 
preempt at time T1 and then actually preempt at time T2 then the cluster 
conditions may have changed between T1 and T2 which may invalidate the 
decisions taken at T1. New resources may have freed up that reduce the number 
of containers to be killed. This sub-optimality is directly proportional to 
length of time between T1 and T2. So ideally we want to keep T1=T2. One can 
argue that things can change after the preemption which may have made the 
preemption unnecessary. So the above argument of T1=T2 is fallacious. However, 
preemption policies are usually based on deadlines such as the allocation of 
queue1 must be met within X seconds. So RM does not have the luxury of waiting 
for X+1 seconds. The best it can do is to wait upto X seconds in the hope that 
things will work out and at X redistribute resources to meet the deficit.

At the same time, I can see that there is an argument that the AM knows best 
how to free up its resources. It will be good to remember that the AM has 
already informed the RM about the importance of all its containers when it made 
the requests at different priorities. So the RM knows the order of importance 
of the containers and the RM also knows the amount of time each container has 
been allocated. Assuming container runtime as a proxy for container work done, 
this data can be used by the RM to preempt in a work preserving manner without 
having to talk to the AM.

Notifying the AM has the usefulness of allowing the AM to take actions that 
preserve work such as checkpointing. However, IMO, the AM should only do 
checkpointing operations but not kill the containers. That should still happen 
at the RM as the very last option at the last moment. If the situation changes 
in the grace period and the containers do not need to be killed then there is 
no point in the AM killing them right now. This also lets us increase the grace 
period to a longer time because checkpointing and preserving work usually means 
persisting data in a stable store and may be slow in practical scenarios.

To summarize, I would propose an API in which the RM tells the AM about exactly 
which containers it might imminently preempt with the contract being that the 
AM could take actions to preserve the work done in those containers. The AM can 
continue to run those containers until the RM actually preempts them if needed. 
If we really think that the choice of containers needs to be made at the AM 
then the AM needs to checkpoint those containers and inform the RM about the 
containers it has chosen. But the final decision to send the kill must be sent 
by the RM.

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-476) ProcfsBasedProcessTree info message confuses users

2013-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630771#comment-13630771
 ] 

Hadoop QA commented on YARN-476:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578547/YARN-476.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/732//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/732//console

This message is automatically generated.

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Sandy Ryza
Priority: Minor
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.

2013-04-12 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-573:
--

 Summary: Shared data structures in Public Localizer and Private 
Localizer are not Thread safe.
 Key: YARN-573
 URL: https://issues.apache.org/jira/browse/YARN-573
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi


PublicLocalizer
1) pending accessed by addResource (part of event handling) and run method (as 
a part of PublicLocalizer.run() ).

PrivateLocalizer
1) pending accessed by addResource (part of event handling) and 
findNextResource (i.remove()).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2013-04-12 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created YARN-574:
--

 Summary: PrivateLocalizer does not support parallel resource 
download via ContainerLocalizer
 Key: YARN-574
 URL: https://issues.apache.org/jira/browse/YARN-574
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi


At present private resources will be downloaded in parallel only if multiple 
containers request the same resource. However otherwise it will be serial. The 
protocol between PrivateLocalizer and ContainerLocalizer supports multiple 
downloads however it is not used and only one resource is sent for downloading 
at a time.

I think we can increase / assure parallelism (even for single container 
requesting resource) for private/application resources by making multiple 
downloads per ContainerLocalizer.
Total Parallelism before
= number of threads allotted for PublicLocalizer [public resource] + number of 
containers[private and application resource]
Total Parallelism after
= number of threads allotted for PublicLocalizer [public resource] + number of 
containers * max downloads per container [private and application resource]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630893#comment-13630893
 ] 

Chris Douglas commented on YARN-45:
---

[~sandyr]: Yes, but the correct format/semantics for time are a complex 
discussion in themselves. To keep this easy to review and the discussion 
focused, we were going to file that separately. But I totally agree: for the AM 
to respond intelligently, the time before it's forced to give up the container 
is valuable input.

[~bikash]: Agree almost completely. In YARN-569, the hysteresis you cite 
motivated several design points, including multiple dampers on actions taken by 
the preemption policy, out-of-band observation/enforcement, and no effort to 
fine-tune particular allocations. The role of preemption (to summarize what 
[~curino] discussed in detail in the prenominate JIRA) is to make coarse 
corrections around the core scheduler invariants (e.g., capacity, fairness). 
Rather than introducing new races or complexity, one could argue that 
preemption is a dual of allocation in an inconsistent environment.

Your proposal matches case (1) in the above 
[comment|https://issues.apache.org/jira/browse/YARN-45?focusedCommentId=13628950page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13628950],
 where the RM specifies the set of containers in jeopardy and a contract (as 
{{ResourceRequest}}) for avoiding the kills, should the AM have cause to pick 
different containers. Further, your observation that the RM has enough 
information in priorities, etc. to make an educated guess at those containers 
is spot-on. IIRC, the policy uses allocation order when selecting containers, 
but that should be a secondary key after priority.

The disputed point, and I'm not sure we actually disagree, is the claim that 
the AM should never kill things in response to this message. To be fair, that 
can be implemented by just ignoring the requests, so it's orthogonal to this 
particular protocol, but it's certainly an important best practice to discuss 
to ensure we're capturing the right thing. Certainly there are many cases where 
ignoring the message is correct; most CDFs of map task execution time show that 
over 80% finish in less than a minute, so the AM has few reasons to 
pessimistically kill them.

There are a few scenarios where this isn't optimal. Take the case of YARN-415, 
where the AM is billed cumulatively for cluster time. Assume an AM knows (a) 
the container will not finish (reinforcing [~sandyr]'s point about including 
time in the preemption message) and (b) the work done is not worth 
checkpointing. It can conclude that killing the container is in its best 
interest, because squatting on the resource could affect its ability to get 
containers in the future (or simply cost more). Moreover, for long-lived 
services and speculative container allocation/retention, the AM may actually be 
holding the container only as an optimization or for a future execution, so it 
could release it at low cost to itself. Finally, the time allowed before the RM 
starts killing containers can be extended if AMs typically return resources 
before the deadline.

It's also a mechanism for the RM to advise the AM about constraints that 
prevent it from granting its pending requests. The AM currently kills reducers 
if it can't get containers to regenerate lost map output. If the scheduler 
values some containers more than others, the AM's response to starvation can be 
improved from random killing. This is a case where the current implementation 
acknowledges the fact that it already runs in an inconsistent environment.

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on 

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-12 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630898#comment-13630898
 ] 

Carlo Curino commented on YARN-45:
--

As you pointed out, any decision made in the RM needs to deal with an 
inconsistent and evolving view of the world, and the preemption actions suffer 
from an inherent and significant lag. In designing policies around this, one 
must embrace such chaos and operate conservatively and try to affect only 
macroscopic properties (hence the many built-in dampers Chris mentioned). 

As for what to do with the preemption requests, I think we are quite aligned 
with your comments in our current implementation for the mapreduce AM/Task. 

Here's what we do:
1) Maps are typically short-lived, so it is often worth ignoring the preemption 
request and try to make a run for it, as checkpointing and completion times 
risk to be comparable, and re-execution costs are low. 

2) For reducer, since the state is valuable and runtimes often longer, the AM 
asks the task to checkpoint. In our current implementation, once the state of 
the reducer has been saved to a checkpoint we exit, as continuing execution is 
non-trivial (in particular managing partial output of reducers).  I can 
envision a future version that tries to continue running after having taken a 
checkpoint. 
Note that this (the task exiting) does not introduce any new 
race-condition/complexity in either RM or AM, as both already handle 
failing/killed tasks, and the AM even have logic to kill its own reducers to 
free up space for maps.  
More importantly, this setup (in which containers exit as soon as they are done 
checkpointing) allows us to set rather generous wait-before-kill parameters, 
since the containers will be reclaimed as soon as the task is done 
checkpointing anyway. 
The alternative would have the RM pick a static policy for waiting, which risks 
to be either too long (hence delaying by too much the rebalancing), or too 
short (which risks to interrupt containers while finishing the checkpointing 
thus wasting work). I expect that no static solution would fair well for a 
broad range of AMs and job sizes. 

3) When the preemption takes the form of a ResourceRequest we pick reducers 
over maps (as having reducers running when the map are killed would simply lead 
to wasted slot time). Looking forward in Yarn's future this is a key feature as 
other applications might have evolving priorities for containers which are not 
exposed to the RM, hence we can't rely on the RM to guess which container is 
best to preempt, and delegating the choice to the AM could be invaluable.


 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-575) ContainerManager APIs should be user accessible

2013-04-12 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-575:
---

 Summary: ContainerManager APIs should be user accessible
 Key: YARN-575
 URL: https://issues.apache.org/jira/browse/YARN-575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Priority: Critical


Auth for ContainerManager is based on the containerId being accessed - since 
this is what is used to launch containers (There's likely another jira 
somewhere to change this to not be containerId based).
What this also means is the API is effectively not usable with kerberos 
credentials.
Also, it should be possible to use this API with some generic tokens 
(RMDelegation?), instead of with Container specific tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-12 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630925#comment-13630925
 ] 

Alejandro Abdelnur commented on YARN-45:


Comments on the patch. 

* Reusing ResourceRequest means we have a bunch of properties that are not 
applicable to the preempt message. Wouldn't be enough just to return the 
ContainerIds and a flag indicating that the set is strict or not? The AM can 
reconstruct all the resources information if it needs to. 

*Do we need the get*Count() methods? You can get the size from the set itself, 
or am I missing something?

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-12 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630939#comment-13630939
 ] 

nemon lou commented on YARN-276:


[~tgraves]
Here is the initial thoughts on checking cluster level  AM resource percent in 
each leafqueue:
Leaf queue'capacity is computed based on absoluteMaxCapacity. 
Considering we have 10 leaf queues,each with a value of 100% 
absoluteMaxCapacity and 10% maxAMResourcePerQueuePercent configured,
there is still a chance that all leaf queue's resources taken up by AM before 
reaching the 10% maxAMResourcePerQueuePercent limit.

Note that a cluster basis' am resource percent only works in leaf queue if no 
am resource percent configured for this leaf queue.

As Thomas Graves mentioned,cluster level checking will causing one queue 
restrict another.I will remove cluster level checking.






 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

2013-04-12 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-276:
---

Attachment: YARN-276.patch

uploading a interim patch.

 Capacity Scheduler can hang when submit many jobs concurrently
 --

 Key: YARN-276
 URL: https://issues.apache.org/jira/browse/YARN-276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.0.1-alpha
Reporter: nemon lou
Assignee: nemon lou
 Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
 YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
 scheduler can hang with most resources taken up by AM and don't have enough 
 resources for tasks.And then all applications hang there.
 The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not 
 check directly.Instead ,this property only used for maxActiveApplications. 
 And maxActiveApplications is computed by minimumAllocation (not by Am 
 actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira