date:20130412


[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629837#comment-13629837
 ] 

Xuan Gong commented on YARN-561:


org.apache.hadoop.yarn.api.records.container has containerId and NodeId(which 
can get address and port) which are enough for container talked to its local 
NM. And by YARN-486, we have already add 
org.apache.hadoop.yarn.api.records.container to ContainImpl. So, it will get 
those information now.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs


[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629839#comment-13629839
 ] 

Hadoop QA commented on YARN-441:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578368/YARN-441.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/725//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/725//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-api.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/725//console

This message is automatically generated.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-441) Clean up unused collection methods in various APIs


 [ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-441:
---

Attachment: YARN-441.5.patch

1. fix -1 on javadoc
2. fix -1 on findbug by remove the keyword synchronized from 
StartContainerResponsePbImpl.java

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch, YARN-441.5.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-570) Time strings are formated in different timezone


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629880#comment-13629880
 ] 

Hadoop QA commented on YARN-570:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12577997/MAPREDUCE-5141.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/726//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/726//console

This message is automatically generated.

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: PengZhang
 Attachments: MAPREDUCE-5141.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-441) Clean up unused collection methods in various APIs


[ 
https://issues.apache.org/jira/browse/YARN-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629887#comment-13629887
 ] 

Hadoop QA commented on YARN-441:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578389/YARN-441.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/727//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/727//console

This message is automatically generated.

 Clean up unused collection methods in various APIs
 --

 Key: YARN-441
 URL: https://issues.apache.org/jira/browse/YARN-441
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: YARN-441.1.patch, YARN-441.2.patch, YARN-441.3.patch, 
 YARN-441.4.patch, YARN-441.5.patch


 There's a bunch of unused methods like getAskCount() and getAsk(index) in 
 AllocateRequest, and other interfaces. These should be removed.
 In YARN, found them in. MR will have it's own set.
 AllocateRequest
 StartContaienrResponse

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2013-04-12 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629907#comment-13629907
 ] 

Harsh J commented on YARN-570:
--

Thanks for the report and the patch!

With this patch it now renders it this way:

renderHadoopDate() - Wed, 10 Apr 2013 08:29:56 GMT+05:30
format() - 10-Apr-2013 08:29:56

Which I think is still inconsistent. Ideally, I think, we'd want the former 
everywhere for consistency. Can you update format() as well to print in the 
same style, if you agree?

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: PengZhang
 Attachments: MAPREDUCE-5141.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl

2013-04-12 Thread Kenji Kikushima (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-457:
-

Attachment: YARN-457-4.patch

Attached a patch for trunk. Thank you.

 Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
 

 Key: YARN-457
 URL: https://issues.apache.org/jira/browse/YARN-457
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
Priority: Minor
  Labels: Newbie
 Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457-4.patch, 
 YARN-457.patch


 {code}
 if (updatedNodes == null) {
   this.updatedNodes.clear();
   return;
 }
 {code}
 If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629987#comment-13629987
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-Yarn-trunk #181 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/181/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
*

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.


[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629991#comment-13629991
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-Yarn-trunk #181 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/181/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality


[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630008#comment-13630008
 ] 

Hudson commented on YARN-412:
-

Integrated in Hadoop-trunk-Commit #3607 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3607/])
YARN-412. Fixed FifoScheduler to check hostname of a NodeManager rather 
than its host:port during scheduling which caused incorrect locality for 
containers. Contributed by Roger Hoover. (Revision 1467244)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467244
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630040#comment-13630040
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-Hdfs-trunk #1370 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1370/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
*

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.


[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630044#comment-13630044
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-Hdfs-trunk #1370 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1370/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = FAILURE
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS

2013-04-12 Thread Timothy St. Clair (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630059#comment-13630059
 ] 

Timothy St. Clair commented on YARN-160:


If it's possible to tag along development on this one, I would be interested in 
the approach.  IMHO referencing existing solutions gauges baseline: 

Ref:
http://www.open-mpi.org/projects/hwloc/
http://www.rce-cast.com/Podcast/rce-33-hwloc-portable-hardware-locality.html
http://gridscheduler.sourceforge.net/projects/hwloc/GridEnginehwloc.html

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.0.5-beta


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality


[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630075#comment-13630075
 ] 

Hudson commented on YARN-412:
-

Integrated in Hadoop-Mapreduce-trunk #1397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1397/])
YARN-412. Fixed FifoScheduler to check hostname of a NodeManager rather 
than its host:port during scheduling which caused incorrect locality for 
containers. Contributed by Roger Hoover. (Revision 1467244)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467244
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNodes.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Fix For: 2.0.4-alpha

 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-486) Change startContainer NM API to accept Container as a parameter and make ContainerLaunchContext user land


[ 
https://issues.apache.org/jira/browse/YARN-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630079#comment-13630079
 ] 

Hudson commented on YARN-486:
-

Integrated in Hadoop-Mapreduce-trunk #1397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1397/])
YARN-486. Changed NM's startContainer API to accept Container record given 
by RM as a direct parameter instead of as part of the ContainerLaunchContext 
record. Contributed by Xuan Gong.
MAPREDUCE-5139. Update MR AM to use the modified startContainer API after 
YARN-486. Contributed by Xuan Gong. (Revision 1467063)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467063
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerRemoteLaunchEvent.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskAttemptContainerRequest.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/launcher/TestContainerLauncher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/StartContainerRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/StartContainerRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationSubmissionContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerLaunchContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/BuilderUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLaunchRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestRPC.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
*

[jira] [Commented] (YARN-319) Submit a job to a queue that not allowed in fairScheduler, client will hold forever.


[ 
https://issues.apache.org/jira/browse/YARN-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630083#comment-13630083
 ] 

Hudson commented on YARN-319:
-

Integrated in Hadoop-Mapreduce-trunk #1397 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1397/])
Fixing CHANGES.txt entry for YARN-319. (Revision 1467133)

 Result = SUCCESS
vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467133
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Submit a job to a queue that not allowed in fairScheduler, client will hold 
 forever.
 

 Key: YARN-319
 URL: https://issues.apache.org/jira/browse/YARN-319
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.0.2-alpha
Reporter: shenhong
Assignee: shenhong
 Fix For: 2.0.5-beta

 Attachments: YARN-319-1.patch, YARN-319-2.patch, YARN-319-3.patch, 
 YARN-319.patch


 RM use fairScheduler, when client submit a job to a queue, but the queue do 
 not allow the user to submit job it, in this case, client  will hold forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-514) Delayed store operations should not result in RM unavailability for app submission

2013-04-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630320#comment-13630320
 ] 

Bikas Saha commented on YARN-514:
-

you only need to add the new field in the enum. I dont think we should change 
the values of all existing enums.

 Delayed store operations should not result in RM unavailability for app 
 submission
 --

 Key: YARN-514
 URL: https://issues.apache.org/jira/browse/YARN-514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Zhijie Shen
 Attachments: YARN-514.1.patch, YARN-514.2.patch, YARN-514.3.patch, 
 YARN-514.4.patch


 Currently, app submission is the only store operation performed synchronously 
 because the app must be stored before the request returns with success. This 
 makes the RM susceptible to blocking all client threads on slow store 
 operations, resulting in RM being perceived as unavailable by clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-571) User should not be part of ContainerLaunchContext


 [ 
https://issues.apache.org/jira/browse/YARN-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-571:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-386

 User should not be part of ContainerLaunchContext
 -

 Key: YARN-571
 URL: https://issues.apache.org/jira/browse/YARN-571
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah

 Today, a user is expected to set the user name in the CLC when either 
 submitting an application or launching a container from the AM. This does not 
 make sense as the user can/has been identified by the RM as part of the RPC 
 layer.
 Solution would be to move the user information into either the Container 
 object or directly into the ContainerToken which can then be used by the NM 
 to launch the container. This user information would set into the container 
 by the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-572) Remove duplication of data in Container

Hitesh Shah created YARN-572:


 Summary: Remove duplication of data in Container 
 Key: YARN-572
 URL: https://issues.apache.org/jira/browse/YARN-572
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah


Most of the information needed to launch a container is duplicated in both the 
Container class as well as in the ContainerToken object that the Container 
object already contains. It would be good to remove this level of duplication. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-457) Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl


[ 
https://issues.apache.org/jira/browse/YARN-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630425#comment-13630425
 ] 

Xuan Gong commented on YARN-457:


+1, Looks good

 Setting updated nodes from null to null causes NPE in AllocateResponsePBImpl
 

 Key: YARN-457
 URL: https://issues.apache.org/jira/browse/YARN-457
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
Priority: Minor
  Labels: Newbie
 Attachments: YARN-457-2.patch, YARN-457-3.patch, YARN-457-4.patch, 
 YARN-457.patch


 {code}
 if (updatedNodes == null) {
   this.updatedNodes.clear();
   return;
 }
 {code}
 If updatedNodes is already null, a NullPointerException is thrown.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-513) Verify all clients will wait for RM to restart


 [ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-513:
--

Assignee: Xuan Gong  (was: Jian He)

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality


[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630440#comment-13630440
 ] 

Sandy Ryza commented on YARN-392:
-

Any further thoughts on this?

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392-1.patch, YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-04-12 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530.4.patch

no tangible change than previously; publishing to keep in sync with the updated 
YARN-117 everything patch

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-117) Enhance YARN service model

2013-04-12 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117.4.patch

The changes here since the last patch related to the test 
{{TestNodeStatusUpdater}} which was failing on Jenkins but not locally.

#adding timeouts in the {{syncBarrier.await()}} clause handle better the 
situation where the rollback of a failing {[start()}} doesn't block -as the 
barrier in the test case isn't reached as it would be on the same thread.

#lots of extra assertions and debugging to see why {{testNMConnectionToRM()}} 
fails most of the time on a Linux test VM. It looks like the time-based 
assertions are brittle there (not fixed)

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner

[jira] [Commented] (YARN-117) Enhance YARN service model


[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630478#comment-13630478
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578484/YARN-117.4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/729//console

This message is automatically generated.

 Enhance YARN service model
 --

 Key: YARN-117
 URL: https://issues.apache.org/jira/browse/YARN-117
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, 
 YARN-117.patch


 Having played the YARN service model, there are some issues
 that I've identified based on past work and initial use.
 This JIRA issue is an overall one to cover the issues, with solutions pushed 
 out to separate JIRAs.
 h2. state model prevents stopped state being entered if you could not 
 successfully start the service.
 In the current lifecycle you cannot stop a service unless it was successfully 
 started, but
 * {{init()}} may acquire resources that need to be explicitly released
 * if the {{start()}} operation fails partway through, the {{stop()}} 
 operation may be needed to release resources.
 *Fix:* make {{stop()}} a valid state transition from all states and require 
 the implementations to be able to stop safely without requiring all fields to 
 be non null.
 Before anyone points out that the {{stop()}} operations assume that all 
 fields are valid; and if called before a {{start()}} they will NPE; 
 MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
 for this. It is independent of the rest of the issues in this doc but it will 
 aid making {{stop()}} execute from all states other than stopped.
 MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
 review and take up; this can be done with issues linked to this one.
 h2. AbstractService doesn't prevent duplicate state change requests.
 The {{ensureState()}} checks to verify whether or not a state transition is 
 allowed from the current state are performed in the base {{AbstractService}} 
 class -yet subclasses tend to call this *after* their own {{init()}}, 
 {{start()}}  {{stop()}} operations. This means that these operations can be 
 performed out of order, and even if the outcome of the call is an exception, 
 all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
 demonstrates this.
 This is a tricky one to address. In HADOOP-3128 I used a base class instead 
 of an interface and made the {{init()}}, {{start()}}  {{stop()}} methods 
 {{final}}. These methods would do the checks, and then invoke protected inner 
 methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
 retrofit the same behaviour to everything that extends {{AbstractService}} 
 -something that must be done before the class is considered stable (because 
 once the lifecycle methods are declared final, all subclasses that are out of 
 the source tree will need fixing by the respective developers.
 h2. AbstractService state change doesn't defend against race conditions.
 There's no concurrency locks on the state transitions. Whatever fix for wrong 
 state calls is added should correct this to prevent re-entrancy, such as 
 {{stop()}} being called from two threads.
 h2.  Static methods to choreograph of lifecycle operations
 Helper methods to move things through lifecycles. init-start is common, 
 stop-if-service!=null another. Some static methods can execute these, and 
 even call {{stop()}} if {{init()}} raises an exception. These could go into a 
 class {{ServiceOps}} in the same package. These can be used by those services 
 that wrap other services, and help manage more robust shutdowns.
 h2. state transition failures are something that registered service listeners 
 may wish to be informed of.
 When a state transition fails a {{RuntimeException}} can be thrown -and the 
 service listeners are not informed as the notification point isn't reached. 
 They may wish to know this, especially for management and diagnostics.
 *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
 {{stateChangeFailed(Service service,Service.State targeted-state, 
 RuntimeException e)}} that is invoked from the (final) state change methods 
 in the {{AbstractService}} class (once they delegate to their inner 
 {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
 implementations of the interface.
 h2. Service listener failures not handled
 Is this

[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services


[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630489#comment-13630489
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578482/YARN-530.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/728//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/728//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/728//console

This message is automatically generated.

 Define Service model strictly, implement AbstractService for robust 
 subclassing, migrate yarn-common services
 -

 Key: YARN-530
 URL: https://issues.apache.org/jira/browse/YARN-530
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-117changes.pdf, YARN-530-2.patch, YARN-530-3.patch, 
 YARN-530.4.patch, YARN-530.patch


 # Extend the YARN {{Service}} interface as discussed in YARN-117
 # Implement the changes in {{AbstractService}} and {{FilterService}}.
 # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

2013-04-12 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630493#comment-13630493
 ] 

Vinod Kumar Vavilapalli commented on YARN-561:
--

Xuan, what Hitesh is saying is that when a container starts as a process, it 
doesn't know its containerId. We should make NM export it as part of the env.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-571) User should not be part of ContainerLaunchContext

2013-04-12 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-571:


Assignee: Vinod Kumar Vavilapalli

Taking a shot at this..

 User should not be part of ContainerLaunchContext
 -

 Key: YARN-571
 URL: https://issues.apache.org/jira/browse/YARN-571
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Vinod Kumar Vavilapalli

 Today, a user is expected to set the user name in the CLC when either 
 submitting an application or launching a container from the AM. This does not 
 make sense as the user can/has been identified by the RM as part of the RPC 
 layer.
 Solution would be to move the user information into either the Container 
 object or directly into the ContainerToken which can then be used by the NM 
 to launch the container. This user information would set into the container 
 by the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-572) Remove duplication of data in Container


 [ 
https://issues.apache.org/jira/browse/YARN-572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned YARN-572:


Assignee: Hitesh Shah

 Remove duplication of data in Container 
 

 Key: YARN-572
 URL: https://issues.apache.org/jira/browse/YARN-572
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Hitesh Shah

 Most of the information needed to launch a container is duplicated in both 
 the Container class as well as in the ContainerToken object that the 
 Container object already contains. It would be good to remove this level of 
 duplication. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-435) Make it easier to access cluster topology information in an AM


 [ 
https://issues.apache.org/jira/browse/YARN-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah reassigned YARN-435:


Assignee: Hitesh Shah

 Make it easier to access cluster topology information in an AM
 --

 Key: YARN-435
 URL: https://issues.apache.org/jira/browse/YARN-435
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Hitesh Shah

 ClientRMProtocol exposes a getClusterNodes api that provides a report on all 
 nodes in the cluster including their rack information. 
 However, this requires the AM to open and establish a separate connection to 
 the RM in addition to one for the AMRMProtocol. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-04-12 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630619#comment-13630619
]

Daryn Sharp commented on YARN-503:
--

bq. There's likely a race between the RenewalTask and AbortTask. [...] I think
it's possible for a schedulesTask to be executing - in which case the
abortScheduleTask() may have no affect and can result in the wrong task being
cancelled / scheduled.
It's ok if another task is executing, because it's just trying to abort any
pending task. Since there's only one possible pending task per token at any
given time, the wrong task can't be cancelled. Did I miss an edge case?

bq. ManagedApp.add - instead of adding to the app and the token here, this
composite add can be kept outside of ManagedApp/ManagedToken
Not sure I follow. Are you suggesting to move {{managedToken.add(appId)}} into
the loop in {{addApplication}}? I was trying to encapsulate the implementation
details of adding/removing the appId within ManagedApp. Is it ok to leave it
as-is?

bq. ManagedApp.expunge() - is synchronization on 'appTokens' required ?
Strictly speaking, probably not. It's a throwback to earlier implementation
that was doing trickier stuff. It was to avoid concurrent modification
exceptions while iterating, but appTokens isn't mutating in multiple threads.
And the {{remove}} is essentially guarding it too. For that matter, I don't
think {{appTokens}} needs to be a synch-ed set. I'll change it.

bq. MangedToken.expunge() - tokenApps.clear() required ?
Probably not. Seemed like good housekeeping, but I'll remove it.

bq. In the unit test - the 1 second sleep seems rather low. Instead of the
sleep, this can be changed to a timed wait on one of the fields being verified.
I don't like sleeps either. 1s is an eternity in this case because the initial
renew and cancel timer tasks fire immediately on mocked objects, so it should
run in a few ms. I assume you are suggesting using notify in a mock'ed answer
method? Multiple timers are expected to fire in some cases, so it would
probably require something like a CountdownLatch, which will get tricky to keep
swapping in a new one by re-adding mocked responses with the new latch. Let me
know if you feel it's worth it to change it.

DelegationTokens will be renewed forever if multiple jobs share tokens and
the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
--

Key: YARN-503
URL: https://issues.apache.org/jira/browse/YARN-503
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 0.23.3, 3.0.0, 2.0.0-alpha
Reporter: Siddharth Seth
Assignee: Daryn Sharp
Attachments: YARN-503.patch, YARN-503.patch

The first Job/App to register a token is the one which DelegationTokenRenewer
associates with a a specific Token. An attempt to remove/cancel these shared
tokens by subsequent jobs doesn't work - since the JobId will not match.
As a result, Even if subsequent jobs have
MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN set to true - tokens will not be
cancelled when those jobs complete.
Tokens will eventually be removed from the RM / JT when the service that
issued them considers them to have expired or via an explicit
cancelDelegationTokens call (not implemented yet in 23).
A side affect of this is that the same delegation token will end up being
renewed multiple times (a separate TimerTask for each job which uses the
token).
DelegationTokenRenewer could maintain a reference count/list of jobIds for
shared tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

[
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630626#comment-13630626
]

Xuan Gong commented on YARN-561:

when container starts as a process, it does not know its containerId. Does it
mean when we execute the script to launch the container, the script does not
include this containerId ?
If I understand it correctly, we can solve this issue like this:
1. We need to add some content into enum Environment, such as
ContainerId(String)(Which can be converted back by using
ConverterUtils.toContainerId(String containerId)), NM hostName(String),
NMPort(int).
2. The container Launch script is write out at ContainerLaunch::call(), and the
environment is also set here. At ContainerLaunch, we already have
org.apache.hadoop.yarn.server.nodemanager.containermanager.container, so
containerId can be simply get. The NM hostName and NMPort can be get from
NM_NodeId which is in NMContext. And ContainerLaunch is initialized from
ContainerLauncher which already has NMContext. So, we can make changes here,
when we initialize the ContainerLaunch, we either input NMContext as parameter,
or simply give NM_NodeId, or just give NM_hostName and NMPort, then we can get
all the information we need.
Any other suggestions ??

Nodemanager should set some key information into the environment of every
container that it launches.
-

Key: YARN-561
URL: https://issues.apache.org/jira/browse/YARN-561
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
Labels: usability

Information such as containerId, nodemanager hostname, nodemanager port is
not set in the environment when any container is launched.
For an AM, the RM does all of this for it but for a container launched by an
application, all of the above need to be set by the ApplicationMaster.
At the minimum, container id would be a useful piece of information. If the
container wishes to talk to its local NM, the nodemanager related information
would also come in handy.

[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.

[
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630635#comment-13630635
]

Hitesh Shah commented on YARN-561:
--

@Xuan, one thing to be careful of this certain env settings should only be set
by the NodeManager when it launches the container and not by an application. So
you would need a notion of certain whitelist environment variables that should
be set only by the NM and not overridden by the env in CLC provided by the
application.

Nodemanager should set some key information into the environment of every
container that it launches.
-

Key: YARN-561
URL: https://issues.apache.org/jira/browse/YARN-561
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
Labels: usability

[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-04-12 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-412:
---

Fix Version/s: (was: 2.0.4-alpha)
   2.0.5-beta

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Fix For: 2.0.5-beta

 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality


[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630649#comment-13630649
 ] 

Hudson commented on YARN-412:
-

Integrated in Hadoop-trunk-Commit #3610 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3610/])
YARN-412. Pushing to 2.0.5-beta only. (Revision 1467470)

 Result = SUCCESS
acmurthy : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1467470
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Assignee: Roger Hoover
Priority: Minor
  Labels: patch
 Fix For: 2.0.5-beta

 Attachments: YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.


[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630652#comment-13630652
 ] 

Xuan Gong commented on YARN-561:


Just like ApplicationConstants which include some variables can only set in the 
AppMaster environment ? At the beginning (From the code 
ContainerLaunch::call()), the env is original from CLC.getEnvironment(), then 
we can set ContainerId, and Node_hostName, Node_portNumber after that.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-562) NM should reject containers allocated by previous RM

2013-04-12 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-562:
-

Attachment: YARN-562.1.patch

This patch does:
1. add RM's cluster timestamp in NM to reject old containers.
2. Block container requests while NM is resyncing with RM.
3. Add test cases for both cases


 NM should reject containers allocated by previous RM
 

 Key: YARN-562
 URL: https://issues.apache.org/jira/browse/YARN-562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-562.1.patch


 Its possible that after RM shutdown, before AM goes down,AM still call 
 startContainer on NM with containers allocated by previous RM. When RM comes 
 back, NM doesn't know whether this container launch request comes from 
 previous RM or the current RM. we should reject containers allocated by 
 previous RM 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630672#comment-13630672
 ] 

Xuan Gong commented on YARN-513:


From ApplicationMaster perspective: 
1. The very first communication it will have with the RM is for Register itself 
with RM which is from AMRMClientImpl::registerApplicationMaster(), so we can 
add waitting logic here, to try several times until it is accepted or throw out 
the exceptions

From Client Perspective: 
1. The very first communication it will have with the RM is 
getNewApplication(), which is in YarnClientImpl::getNewApplication(request), we 
can add waitting logic here.

In order to do that, we need add several const and variables to 
YarnConfiguration, such as AM_RM_CONNECTION_RETRY_INTERVAL_SECS, 
AM_RM_CONNECT_WAIT_SECS, CLIENT_RM_CONNECTION_RETRY_INTERVAL_SECS and 
CLIENT_RM_CONNECTION_WAIT_SECS.

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-513) Verify all clients will wait for RM to restart

2013-04-12 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630677#comment-13630677
 ] 

Bikas Saha commented on YARN-513:
-

What about other interactions with the RM such as allocate() or 
finishApplicationMaster()

 Verify all clients will wait for RM to restart
 --

 Key: YARN-513
 URL: https://issues.apache.org/jira/browse/YARN-513
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Xuan Gong

 When the RM is restarting, the NM, AM and Clients should wait for some time 
 for the RM to come back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-562) NM should reject containers allocated by previous RM


[ 
https://issues.apache.org/jira/browse/YARN-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630678#comment-13630678
 ] 

Hadoop QA commented on YARN-562:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578512/YARN-562.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/730//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/730//console

This message is automatically generated.

 NM should reject containers allocated by previous RM
 

 Key: YARN-562
 URL: https://issues.apache.org/jira/browse/YARN-562
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-562.1.patch


 Its possible that after RM shutdown, before AM goes down,AM still call 
 startContainer on NM with containers allocated by previous RM. When RM comes 
 back, NM doesn't know whether this container launch request comes from 
 previous RM or the current RM. we should reject containers allocated by 
 previous RM 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-561) Nodemanager should set some key information into the environment of every container that it launches.


[ 
https://issues.apache.org/jira/browse/YARN-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630681#comment-13630681
 ] 

Hitesh Shah commented on YARN-561:
--

Take a look at ContainerLaunch#sanitizeEnv() and how it handles non-modifiable 
environment variables. The above mentioned env variables should also fall into 
the non-modifiable category.

 Nodemanager should set some key information into the environment of every 
 container that it launches.
 -

 Key: YARN-561
 URL: https://issues.apache.org/jira/browse/YARN-561
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Xuan Gong
  Labels: usability

 Information such as containerId, nodemanager hostname, nodemanager port is 
 not set in the environment when any container is launched. 
 For an AM, the RM does all of this for it but for a container launched by an 
 application, all of the above need to be set by the ApplicationMaster. 
 At the minimum, container id would be a useful piece of information. If the 
 container wishes to talk to its local NM, the nodemanager related information 
 would also come in handy. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers


[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630734#comment-13630734
 ] 

Sandy Ryza commented on YARN-45:


Carlo,
I'm glad that this is being proposed.  Have you considered including how long 
the grace period is in the response?

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Attachments: YARN-45.patch, YARN-45.patch


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state

[
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630741#comment-13630741
]

Omkar Vinit Joshi commented on YARN-547:

Canceling the patch as it was fixing the existing problems but was removing
parallelization (based on number of containers not resource). Making sure this
parallelization still exists.
* Removing Invalid transitions for INIT and LOCALIZED however not modifying
DOWNLOADING state transition.
* Making sure that now in PublicLocalizer as well we acquire lock before
downloading. This will fix broken signaling. Now multiple containers will still
try to download but download will start/enqueued only if
** we can acquire lock on LocalizedResource.
** LocalizedResource is still in DOWNLOADING state.

New resource localization is tried even when Localized Resource is in
DOWNLOADING state
---

Key: YARN-547
URL: https://issues.apache.org/jira/browse/YARN-547
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch

At present when multiple containers try to request a localized resource
1) If the resource is not present then first it is created and Resource
Localization starts ( LocalizedResource is in DOWNLOADING state)
2) Now if in this state multiple ResourceRequestEvents come in then
ResourceLocalizationEvents are fired for all of them.
Most of the times it is not resulting into a duplicate resource download but
there is a race condition present there.
Location : ResourceLocalizationService.addResource .. addition of the request
into attempts in case of an event already exists.
The root cause for this is the presence of FetchResourceTransition on
receiving ResourceRequestEvent in DOWNLOADING state.

[jira] [Commented] (YARN-503) DelegationTokens will be renewed forever if multiple jobs share tokens and the first one sets JOB_CANCEL_DELEGATION_TOKEN to false

2013-04-12 Thread Siddharth Seth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630749#comment-13630749
]

Siddharth Seth commented on YARN-503:
-

bq. I don't like sleeps either. 1s is an eternity in this case because the
initial renew and cancel timer tasks fire immediately on mocked objects, so it
should run in a few ms. I assume you are suggesting using notify in a mock'ed
answer method? Multiple timers are expected to fire in some cases, so it would
probably require something like a CountdownLatch, which will get tricky to keep
swapping in a new one by re-adding mocked responses with the new latch. Let me
know if you feel it's worth it to change it.
Was actually suggesting doing the post-sleep verify in a check-sleep loop,
instead of just sleeping. Passing this step indicates the required execution
has completed. Would prefer keeping a sleep out of the tests if we can.
Otherwise a longer sleep for sure.

bq. It's ok if another task is executing, because it's just trying to abort any
pending task. Since there's only one possible pending task per token at any
given time, the wrong task can't be cancelled. Did I miss an edge case?
I think there's an edge case. Sequence
1. [t1] timerTask is a RenewalTask
2. [t1] timer kicks in and starts executing
3. [t2] scheduleCancelled gets called in a parallel thread [via AppRemovalTask]
4. [t2] scheduleCancelled.abortScheduled called - synchronized but does nothing
useful since the current task is already running.
5. [t2] scheduleCancelled runs to completion and creates a cancelTask
6. [t1] completes execution - and calls scheduleTask(new TokenRenewTask(),
renewIn) - which effectively destorys the scheduled cancelTask

bq. Are you suggesting to move managedToken.add(appId) into the loop in
addApplication? I was trying to encapsulate the implementation details of
adding/removing the appId within ManagedApp. Is it ok to leave it as-is?
I thought it'd be cleaner leaving this outside of ManagedApp - ManagedApp
should not be managing ManagedTokens. IAC, don't feel strongly about this;
whatever you decide.

DelegationTokens will be renewed forever if multiple jobs share tokens and
the first one sets JOB_CANCEL_DELEGATION_TOKEN to false
--

[jira] [Updated] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state


 [ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-547:
---

Attachment: yarn-547-20130412.patch

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch, 
 yarn-547-20130412.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-476) ProcfsBasedProcessTree info message confuses users


 [ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-476:


Attachment: YARN-476.patch

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-476) ProcfsBasedProcessTree info message confuses users


[ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630759#comment-13630759
 ] 

Sandy Ryza commented on YARN-476:
-

Attached patch that removes the log statement entirely.

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-476) ProcfsBasedProcessTree info message confuses users


 [ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-476:


Priority: Minor  (was: Major)

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.6
Reporter: Jason Lowe
Priority: Minor
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-547) New resource localization is tried even when Localized Resource is in DOWNLOADING state


[ 
https://issues.apache.org/jira/browse/YARN-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630766#comment-13630766
 ] 

Hadoop QA commented on YARN-547:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12578544/yarn-547-20130412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalResourcesTrackerImpl
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalizedResource

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/731//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/731//console

This message is automatically generated.

 New resource localization is tried even when Localized Resource is in 
 DOWNLOADING state
 ---

 Key: YARN-547
 URL: https://issues.apache.org/jira/browse/YARN-547
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: yarn-547-20130411.1.patch, yarn-547-20130411.patch, 
 yarn-547-20130412.patch


 At present when multiple containers try to request a localized resource 
 1) If the resource is not present then first it is created and Resource 
 Localization starts ( LocalizedResource is in DOWNLOADING state)
 2) Now if in this state multiple ResourceRequestEvents come in then 
 ResourceLocalizationEvents are fired for all of them.
 Most of the times it is not resulting into a duplicate resource download but 
 there is a race condition present there. 
 Location : ResourceLocalizationService.addResource .. addition of the request 
 into attempts in case of an event already exists.
 The root cause for this is the presence of FetchResourceTransition on 
 receiving ResourceRequestEvent in DOWNLOADING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-45) Scheduler feedback to AM to release containers

2013-04-12 Thread Bikas Saha (JIRA)

[
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630769#comment-13630769
]

Bikas Saha commented on YARN-45:

I like the idea of the RM giving information to the AM about actions that it
might take which will affect the AM. However, I am wary of having the action
taken in different places. eg. the KILL to the containers should come from the
RM or the AM exclusively but not from both. Otherwise we open ourselves up to
race conditions, unnecessary kills and complex logic in the RM.

Preemption is something that, IMO the RM needs to do at the very last moment
when there is no other alternative of resource being freed up. If we decide to
preempt at time T1 and then actually preempt at time T2 then the cluster
conditions may have changed between T1 and T2 which may invalidate the
decisions taken at T1. New resources may have freed up that reduce the number
of containers to be killed. This sub-optimality is directly proportional to
length of time between T1 and T2. So ideally we want to keep T1=T2. One can
argue that things can change after the preemption which may have made the
preemption unnecessary. So the above argument of T1=T2 is fallacious. However,
preemption policies are usually based on deadlines such as the allocation of
queue1 must be met within X seconds. So RM does not have the luxury of waiting
for X+1 seconds. The best it can do is to wait upto X seconds in the hope that
things will work out and at X redistribute resources to meet the deficit.

At the same time, I can see that there is an argument that the AM knows best
how to free up its resources. It will be good to remember that the AM has
already informed the RM about the importance of all its containers when it made
the requests at different priorities. So the RM knows the order of importance
of the containers and the RM also knows the amount of time each container has
been allocated. Assuming container runtime as a proxy for container work done,
this data can be used by the RM to preempt in a work preserving manner without
having to talk to the AM.

Notifying the AM has the usefulness of allowing the AM to take actions that
preserve work such as checkpointing. However, IMO, the AM should only do
checkpointing operations but not kill the containers. That should still happen
at the RM as the very last option at the last moment. If the situation changes
in the grace period and the containers do not need to be killed then there is
no point in the AM killing them right now. This also lets us increase the grace
period to a longer time because checkpointing and preserving work usually means
persisting data in a stable store and may be slow in practical scenarios.

To summarize, I would propose an API in which the RM tells the AM about exactly
which containers it might imminently preempt with the contract being that the
AM could take actions to preserve the work done in those containers. The AM can
continue to run those containers until the RM actually preempts them if needed.
If we really think that the choice of containers needs to be made at the AM
then the AM needs to checkpoint those containers and inform the RM about the
containers it has chosen. But the final decision to send the kill must be sent
by the RM.

Scheduler feedback to AM to release containers
--

Key: YARN-45
URL: https://issues.apache.org/jira/browse/YARN-45
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
Attachments: YARN-45.patch, YARN-45.patch

The ResourceManager strikes a balance between cluster utilization and strict
enforcement of resource invariants in the cluster. Individual allocations of
containers must be reclaimed- or reserved- to restore the global invariants
when cluster load shifts. In some cases, the ApplicationMaster can respond to
fluctuations in resource availability without losing the work already
completed by that task (MAPREDUCE-4584). Supplying it with this information
would be helpful for overall cluster utilization [1]. To this end, we want to
establish a protocol for the RM to ask the AM to release containers.
[1] http://research.yahoo.com/files/yl-2012-003.pdf

[jira] [Commented] (YARN-476) ProcfsBasedProcessTree info message confuses users


[ 
https://issues.apache.org/jira/browse/YARN-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630771#comment-13630771
 ] 

Hadoop QA commented on YARN-476:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12578547/YARN-476.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/732//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/732//console

This message is automatically generated.

 ProcfsBasedProcessTree info message confuses users
 --

 Key: YARN-476
 URL: https://issues.apache.org/jira/browse/YARN-476
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Sandy Ryza
Priority: Minor
 Attachments: YARN-476.patch


 ProcfsBasedProcessTree has a habit of emitting not-so-helpful messages such 
 as the following:
 {noformat}
 2013-03-13 12:41:51,957 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28747 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28978 may 
 have finished in the interim.
 2013-03-13 12:41:51,958 INFO [communication thread] 
 org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: The process 28979 may 
 have finished in the interim.
 {noformat}
 As described in MAPREDUCE-4570, this is something that naturally occurs in 
 the process of monitoring processes via procfs.  It's uninteresting at best 
 and can confuse users who think it's a reason their job isn't running as 
 expected when it appears in their logs.
 We should either make this DEBUG or remove it entirely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-573) Shared data structures in Public Localizer and Private Localizer are not Thread safe.

Omkar Vinit Joshi created YARN-573:
--

 Summary: Shared data structures in Public Localizer and Private 
Localizer are not Thread safe.
 Key: YARN-573
 URL: https://issues.apache.org/jira/browse/YARN-573
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi


PublicLocalizer
1) pending accessed by addResource (part of event handling) and run method (as 
a part of PublicLocalizer.run() ).

PrivateLocalizer
1) pending accessed by addResource (part of event handling) and 
findNextResource (i.remove()).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer