date:20130718


 [ 
https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-696:


Assignee: Trevor Lorimer

 Enable multiple states to to be specified in Resource Manager apps REST call
 

 Key: YARN-696
 URL: https://issues.apache.org/jira/browse/YARN-696
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Trevor Lorimer
Assignee: Trevor Lorimer
Priority: Trivial
 Attachments: 0001-YARN-696.patch


 Within the YARN Resource Manager REST API the GET call which returns all 
 Applications can be filtered by a single State query parameter (http://rm 
 http address:port/ws/v1/cluster/apps). 
 There are 8 possible states (New, Submitted, Accepted, Running, Finishing, 
 Finished, Failed, Killed), if no state parameter is specified all states are 
 returned, however if a sub-set of states is required then multiple REST calls 
 are required (max. of 7).
 The proposal is to be able to specify multiple states in a single REST call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-936) RMWebServices filtering apps by states uses RMAppState instead of YarnAppilcationState.

Vinod Kumar Vavilapalli created YARN-936:


 Summary: RMWebServices filtering apps by states uses RMAppState 
instead of YarnAppilcationState.
 Key: YARN-936
 URL: https://issues.apache.org/jira/browse/YARN-936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli


Realized this while reviewing YARN-696. YarnApplicationState is the end user 
API and one that users expect to pass as argument to the REST API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-935) Correcting pom.xml to build applicationhistoryserver sub-project successfully


[ 
https://issues.apache.org/jira/browse/YARN-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712089#comment-13712089
 ] 

Hadoop QA commented on YARN-935:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592927/YARN-935.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1516//console

This message is automatically generated.

 Correcting pom.xml to build applicationhistoryserver sub-project successfully
 -

 Key: YARN-935
 URL: https://issues.apache.org/jira/browse/YARN-935
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-935.1.patch


 The branch was created from branch-2, 
 hadoop-yarn-server-applicationhistoryserver/pom.xml should use 
 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be 
 built correctly because of wrong dependency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl

2013-07-18 Thread J.Andreina (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated YARN-933:


Description: 
Hostname enabled.
am max retries configured as 3 at client and RM side.

Step 1: Install cluster with NM on 2 Machines 
Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
using Hostname should fail
Step 3: Execute a job
Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
connection loss happened.

Observation :
==
After AppAttempt_1 has moved to failed state ,release of container for 
AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed.

1. Then again retry for AppAttempt_1 happens.
2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
InvalidStateTransitonException
3. Client got exited after AppAttempt_1 is been finished [But actually job is 
still running ], while the appattempts configured is 3 and rest appattempts are 
all sponed and running.


RMLogs:
==
2013-07-17 16:22:51,013 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
maxRetries=45
2013-07-17 16:36:07,091 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to 
EXPIRED

2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering appattempt_1373952096466_0056_02

2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Application removed - appId: application_1373952096466_0056 user: Rex 
leaf-queue of parent: root #applications: 35

2013-07-17 16:36:07,132 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Submission: appattempt_1373952096466_0056_02, 
2013-07-17 16:36:07,138 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED

2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
maxRetries=45
2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
maxRetries=45
2013-07-17 16:38:56,207 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1373952096466_0056_01. Got exception: 
java.lang.reflect.UndeclaredThrowableException
2013-07-17 16:38:56,207 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
LAUNCH_FAILED at FAILED
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)

Client Logs

Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=host-10-18-40-15/10.18.40.59:8020]
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation:

[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl

2013-07-18 Thread J.Andreina (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated YARN-933:


Description: 
am max retries configured as 3 at client and RM side.

Step 1: Install cluster with NM on 2 Machines 
Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But 
using Hostname should fail
Step 3: Execute a job
Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , 
connection loss happened.

Observation :
==
After AppAttempt_1 has moved to failed state ,release of container for 
AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed.

1. Then again retry for AppAttempt_1 happens.
2. Again RM side it is trying to launch AppAttempt_1, hence fails with 
InvalidStateTransitonException
3. Client got exited after AppAttempt_1 is been finished [But actually job is 
still running ], while the appattempts configured is 3 and rest appattempts are 
all sponed and running.


RMLogs:
==
2013-07-17 16:22:51,013 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED
2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); 
maxRetries=45
2013-07-17 16:36:07,091 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:container_1373952096466_0056_01_01 Timed out after 600 secs
2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to 
EXPIRED

2013-07-17 16:36:07,093 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Registering appattempt_1373952096466_0056_02

2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application appattempt_1373952096466_0056_01 is done. finalState=FAILED
2013-07-17 16:36:07,131 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
Application removed - appId: application_1373952096466_0056 user: Rex 
leaf-queue of parent: root #applications: 35

2013-07-17 16:36:07,132 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Application Submission: appattempt_1373952096466_0056_02, 
2013-07-17 16:36:07,138 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED

2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); 
maxRetries=45
2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); 
maxRetries=45
2013-07-17 16:38:56,207 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1373952096466_0056_01. Got exception: 
java.lang.reflect.UndeclaredThrowableException
2013-07-17 16:38:56,207 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
LAUNCH_FAILED at FAILED
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630)
 at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)

Client Logs

Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=host-10-18-40-15/10.18.40.59:8020]
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712197#comment-13712197
 ] 

Devaraj K commented on YARN-873:


The latest patch overall looks good to me. These are things I feel which we can 
take care,

1. Application -status command returns exit code as 0 when the application 
doesn't exist. Can we return the non-zero exit status code when the application 
doesn't exist?

2. In TestClientRMService.java, 
{code:xml}
+try {
+  GetApplicationReportResponse applicationReport = rmService
+  .getApplicationReport(request);
+} catch (ApplicationNotFoundException ex) {
+  getExpectedException = true;
+  Assert.assertEquals(ex.getMessage(),
+  Application with id ' + request.getApplicationId()
+  + ' doesn't exist in RM.);
+}
+Assert.assertTrue(getExpectedException);
{code}

Can we fail after getApplicationReport using Assert.fail() instead of having 
boolean flag and checking. And also applicationReport variable is never used.

3. 
{code:xml}
  // If the RM doesn't have the application, provide the response with
  // application report as null and let the clients to handle.
{code}

Do we have other JIRA to fix the same for kill application, if not can we file 
a JIRA?

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-873.1.patch, YARN-873.2.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-865) RM webservices can't query based on application Types

2013-07-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712216#comment-13712216
 ] 

Hudson commented on YARN-865:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/274/])
YARN-865. RM webservices can't query based on application Types. Contributed by 
Xuan Gong. (hitesh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1504288)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 RM webservices can't query based on application Types
 -

 Key: YARN-865
 URL: https://issues.apache.org/jira/browse/YARN-865
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.1.0-beta

 Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, 
 YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch


 The resource manager web service api to get the list of apps doesn't have a 
 query parameter for appTypes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories

2013-07-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712215#comment-13712215
 ] 

Hudson commented on YARN-922:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/274/])
YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) 
(bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1504261)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java


 Change FileSystemRMStateStore to use directories
 

 Key: YARN-922
 URL: https://issues.apache.org/jira/browse/YARN-922
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.1.0-beta

 Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, 
 YARN-922.patch


 Store each app and its attempts in the same directory so that removing 
 application state is only one operation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701

Arun C Murthy created YARN-937:
--

 Summary: Fix unmanaged AM in non-secure/secure setup post YARN-701
 Key: YARN-937
 URL: https://issues.apache.org/jira/browse/YARN-937
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Arun C Murthy
Assignee: Alejandro Abdelnur
Priority: Blocker
 Fix For: 2.1.0-beta


Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will 
be used in both scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712507#comment-13712507
 ] 

Hitesh Shah commented on YARN-873:
--

[~xgong] Requiring to parse a string message to determine whether an 
application exists or not is more work as compared to checking $? which can be 
used to indicate various errors such as connection issue/invalid application 
id/app does not exist in RM.

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712514#comment-13712514
 ] 

Hadoop QA commented on YARN-873:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592998/YARN-873.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1517//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1517//console

This message is automatically generated.

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712523#comment-13712523
 ] 

Bikas Saha commented on YARN-353:
-

bq. ZKRMStateStore#getNewZooKeeper need not be synchronized
bq. fixed
The code is derived from ActiveStandyLeaderElector code in hadoop common. It 
was synchronized there for a race condition that showed up in testing. I would 
like to keep the synchronization as it was in the original patch.

bq. the patch still seems to have NUM_RETRIES
Why should NUM_RETRIES not be there?

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712528#comment-13712528
 ] 

Bikas Saha commented on YARN-873:
-

Having a return statement in a catch/finally block is not recommended normally. 
We could print the message and re-throw the exception or simply not catch the 
exception. Also, this way the cmd line would exit with non-zero exit code.
{code}
+} catch (ApplicationNotFoundException ex) {
+  sysout.println(Application with id '
+  + applicationId + ' doesn't exist in RM.);
+  return;
+}
{code}

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712533#comment-13712533
 ] 

Karthik Kambatla commented on YARN-353:
---

bq. Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3.
bq. fixed

bq. Why should NUM_RETRIES not be there?
Was just noting that: the latest patch has the non-configurable NUM_RETRIES, it 
should exist but be configurable. If it is configurable, we should probably 
change the name of the variable.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712546#comment-13712546
]

Karthik Kambatla commented on YARN-321:
---

bq. Folks, it would be great if we have a consolidated document that describes
the design and some details.
+1

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

The mapreduce job history server currently needs to be deployed as a trusted
server in sync with the mapreduce runtime. Every new application would need a
similar application history server. Having to deploy O(T*V) (where T is
number of type of application, V is number of version of application) trusted
servers is clearly not scalable.
Job history storage handling itself is pretty generic: move the logs and
history data into a particular directory for later serving. Job history data
is already stored as json (or binary avro). I propose that we create only one
trusted application history server, which can have a generic UI (display json
as a tree of strings) as well. Specific application/version can deploy
untrusted webapps (a la AMs) to query the application history server and
interpret the json for its specific UI and/or analytics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-938) Hadoop 2 Bench marking

2013-07-18 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-938:
--

 Summary: Hadoop 2 Bench marking 
 Key: YARN-938
 URL: https://issues.apache.org/jira/browse/YARN-938
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


I am running the benchmarks on Hadoop 2 and will update the results soon.

Thanks,
Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable

2013-07-18 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-814:
-

Attachment: YARN-814.7.patch

new patch fixed the warnings and added test case for stdout/stderr diagnostics

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, 
 YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-245) Node Manager can not handle duplicate responses

2013-07-18 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-245:
---

Summary: Node Manager can not handle duplicate responses  (was: Node 
Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED)

 Node Manager can not handle duplicate responses
 ---

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable


[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712607#comment-13712607
 ] 

Hadoop QA commented on YARN-814:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593018/YARN-814.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1518//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1518//console

This message is automatically generated.

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, 
 YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable


[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712661#comment-13712661
 ] 

Omkar Vinit Joshi commented on YARN-713:


[~maysamyabandeh] are you working on a patch? Or else I will take over..this is 
critical and needs to be fixed.

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Maysam Yabandeh
Priority: Critical
 Fix For: 2.1.0-beta

 Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, 
 YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos


[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712665#comment-13712665
 ] 

Arun C Murthy commented on YARN-701:


I'm committing this to unblock the rest.

 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, 
 yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-208) Yarn overrides diagnostic message set by AM


 [ 
https://issues.apache.org/jira/browse/YARN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-208.
--

Resolution: Duplicate

Thomas, closing this as duplicate, please reopen if you see it again. Tx.

 Yarn overrides diagnostic message set by AM
 ---

 Key: YARN-208
 URL: https://issues.apache.org/jira/browse/YARN-208
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Thomas Weise

 Diagnostics set in the AM just before exit overridden by Yarn. In the case of 
 state FAILED with different message, for SUCCESS the field will be blank. 
 Should retain application info. Per FinishApplicationMasterRequest this can 
 be managed by ApplicationMaster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-455) NM warns about stopping an unknown container under normal circumstances


 [ 
https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi resolved YARN-455.


Resolution: Duplicate

 NM warns about stopping an unknown container under normal circumstances
 ---

 Key: YARN-455
 URL: https://issues.apache.org/jira/browse/YARN-455
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi

 During normal operations the NM can log warnings to its audit log about 
 unknown containers.  For example:
 {noformat}
 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser  
 IP=xx   OPERATION=Stop Container RequestTARGET=ContainerManagerImpl   
   RESULT=FAILURE  DESCRIPTION=Trying to stop unknown container!   
 APPID=application_1359150825713_3947178 
 CONTAINERID=container_1359150825713_3947178_01_001266
 {noformat}
 Looking closer at the audit log and the NM log shows that the container 
 completed successfully and was forgotten by the NM before the stop request 
 arrived.  The NM should avoid warning in these situations since this is a 
 normal race condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-455) NM warns about stopping an unknown container under normal circumstances


[ 
https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712677#comment-13712677
 ] 

Omkar Vinit Joshi commented on YARN-455:


Closing this as duplicate . I am fixing it at YARN-903

 NM warns about stopping an unknown container under normal circumstances
 ---

 Key: YARN-455
 URL: https://issues.apache.org/jira/browse/YARN-455
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi

 During normal operations the NM can log warnings to its audit log about 
 unknown containers.  For example:
 {noformat}
 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser  
 IP=xx   OPERATION=Stop Container RequestTARGET=ContainerManagerImpl   
   RESULT=FAILURE  DESCRIPTION=Trying to stop unknown container!   
 APPID=application_1359150825713_3947178 
 CONTAINERID=container_1359150825713_3947178_01_001266
 {noformat}
 Looking closer at the audit log and the NM log shows that the container 
 completed successfully and was forgotten by the NM before the stop request 
 arrived.  The NM should avoid warning in these situations since this is a 
 normal race condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions


[ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712688#comment-13712688
 ] 

Vinod Kumar Vavilapalli commented on YARN-658:
--

David, can you give us more information? RM, AM and NM logs will help a lot.

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan

 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-864) YARN NM leaking containers with CGroups


 [ 
https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-864.
--

Resolution: Duplicate

Given Jian's update, I'm closing this as duplicate of YARN-688.

 YARN NM leaking containers with CGroups
 ---

 Key: YARN-864
 URL: https://issues.apache.org/jira/browse/YARN-864
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
 Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and 
 YARN-600.
Reporter: Chris Riccomini
Assignee: Jian He
 Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch


 Hey Guys,
 I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm 
 seeing containers getting leaked by the NMs. I'm not quite sure what's going 
 on -- has anyone seen this before? I'm concerned that maybe it's a 
 mis-understanding on my part about how YARN's lifecycle works.
 When I look in my AM logs for my app (not an MR app master), I see:
 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. 
 This means that container container_1371141151815_0008_03_02 was killed 
 by YARN, either due to being released by the application master or being 
 'lost' due to node failures etc.
 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container 
 container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a 
 new container for the task.
 The AM has been running steadily the whole time. Here's what the NM logs say:
 {noformat}
 05:34:59,783  WARN AsyncDispatcher:109 - Interrupted Exception while stopping
 java.lang.InterruptedException
 at java.lang.Object.wait(Native Method)
 at java.lang.Thread.join(Thread.java:1143)
 at java.lang.Thread.join(Thread.java:1196)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107)
 at 
 org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
 at 
 org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:619)
 05:35:00,314  WARN ContainersMonitorImpl:463 - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 05:35:00,434  WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup 
 at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598
 05:35:00,434  WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup 
 at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02
 05:35:00,434  WARN ContainerLaunch:247 - Failed to launch container.
 java.io.IOException: java.lang.InterruptedException
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:205)
 at org.apache.hadoop.util.Shell.run(Shell.java:129)
 at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
 at 
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 05:35:00,434  WARN ContainerLaunch:247 - Failed to launch container.
 java.io.IOException: java.lang.InterruptedException
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:205)
 at org.apache.hadoop.util.Shell.run(Shell.java:129)
 at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322)
 at 
 org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)

[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos

2013-07-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712698#comment-13712698
 ] 

Hudson commented on YARN-701:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4110 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4110/])
YARN-701. Use application tokens irrespective of secure or non-secure mode. 
Contributed by Vinod K V. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1504604)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAMAuthorization.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 ApplicationTokens should be used irrespective of kerberos
 -

 Key: YARN-701
 URL: https://issues.apache.org/jira/browse/YARN-701
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, 
 YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, 
 yarn-ojoshi-resourcemanager-HW10351.local.log


  - Single code path for secure and non-secure cases is useful for testing, 
 coverage.
  - Having this in non-secure mode will help us avoid accidental bugs in AMs 
 DDos'ing and bringing down RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-938) Hadoop 2 benchmarking


[ 
https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712712#comment-13712712
 ] 

Vinod Kumar Vavilapalli commented on YARN-938:
--

Thanks for doing this Mayank!

 Hadoop 2 benchmarking 
 --

 Key: YARN-938
 URL: https://issues.apache.org/jira/browse/YARN-938
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 I am running the benchmarks on Hadoop 2 and will update the results soon.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-938) Hadoop 2 benchmarking


 [ 
https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-938:
-

Summary: Hadoop 2 benchmarking   (was: Hadoop 2 Bench marking )

 Hadoop 2 benchmarking 
 --

 Key: YARN-938
 URL: https://issues.apache.org/jira/browse/YARN-938
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 I am running the benchmarks on Hadoop 2 and will update the results soon.
 Thanks,
 Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore

2013-07-18 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-353:
-

Attachment: YARN-353.8.patch

New patch made NUM_RETRIES configurable.
Changed removeApplicationState to use multi api to remove both app state and 
attempts state at the same time. Also fixed the warnings.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
 YARN-353.8.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712793#comment-13712793
 ] 

Karthik Kambatla commented on YARN-353:
---

Looks good. +1 pending Jenkins.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
 YARN-353.8.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-919:
---

Attachment: YARN-919-trunk-3.patch

Thanks [~hitesh] and [~vinodkv] for review

Updating the patch

Thanks,
Mayank

 Setting default heap sizes in yarn env
 --

 Key: YARN-919
 URL: https://issues.apache.org/jira/browse/YARN-919
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, 
 YARN-919-trunk-3.patch


 Right now there are no defaults in yarn env scripts for resource manager nad 
 node manager and if user wants to override that, then user has to go to 
 documentation and find the variables and change the script.
 There is no straight forward way to change it in script. Just updating the 
 variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-321) Generic application history service

2013-07-18 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712796#comment-13712796
]

Zhijie Shen commented on YARN-321:
--

bq. Running as service: By default, ApplicationHistoryService will be embedded
inside ResourceManager but will be independent enough to run as a separate
service for scaling purposes.

IIUC, to be independent, ApplicationHistoryService should have its own event
dispatcher, shouldn't it?

Generic application history service
---

Key: YARN-321
URL: https://issues.apache.org/jira/browse/YARN-321
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Luke Lu
Assignee: Vinod Kumar Vavilapalli
Attachments: HistoryStorageDemo.java

[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env


[ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712839#comment-13712839
 ] 

Hadoop QA commented on YARN-919:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12593050/YARN-919-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1519//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1519//console

This message is automatically generated.

 Setting default heap sizes in yarn env
 --

 Key: YARN-919
 URL: https://issues.apache.org/jira/browse/YARN-919
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, 
 YARN-919-trunk-3.patch


 Right now there are no defaults in yarn env scripts for resource manager nad 
 node manager and if user wants to override that, then user has to go to 
 documentation and find the variables and change the script.
 There is no straight forward way to change it in script. Just updating the 
 variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701


[ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712882#comment-13712882
 ] 

Hadoop QA commented on YARN-918:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12592818/YARN-918-20130717.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1521//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1521//console

This message is automatically generated.

 ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
 after YARN-701
 -

 Key: YARN-918
 URL: https://issues.apache.org/jira/browse/YARN-918
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt


 Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
 ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
 as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-903) DistributedShell throwing Errors in logs after successfull completion


 [ 
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-903:
---

Attachment: YARN-903-20130718.1.patch

 DistributedShell throwing Errors in logs after successfull completion
 -

 Key: YARN-903
 URL: https://issues.apache.org/jira/browse/YARN-903
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.0.4-alpha
 Environment: Ununtu 11.10
Reporter: Abhishek Kapoor
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, 
 YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log


 I have tried running DistributedShell and also used ApplicationMaster of the 
 same for my test.
 The application is successfully running through logging some errors which 
 would be useful to fix.
 Below are the logs from NodeManager and ApplicationMasterode
 Log Snippet for NodeManager
 =
 2013-07-07 13:39:18,787 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting 
 to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
 2013-07-07 13:39:19,050 INFO 
 org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
  Rolling master-key for container-tokens, got key with id -325382586
 2013-07-07 13:39:19,052 INFO 
 org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
 Rolling master-key for nm-tokens, got key with id :1005046570
 2013-07-07 13:39:19,053 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as sunny-Inspiron:9993 with total resource of 
 memory:10240, vCores:8
 2013-07-07 13:39:19,053 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
 ContainerManager to unblock new container-requests
 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
 2013-07-07 13:39:35,492 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Start request for container_1373184544832_0001_01_01 by user sunny
 2013-07-07 13:39:35,507 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Creating a new application reference for app application_1373184544832_0001
 2013-07-07 13:39:35,511 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny  
 IP=127.0.0.1OPERATION=Start Container Request   
 TARGET=ContainerManageImpl  RESULT=SUCCESS  
 APPID=application_1373184544832_0001
 CONTAINERID=container_1373184544832_0001_01_01
 2013-07-07 13:39:35,511 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1373184544832_0001 transitioned from NEW to INITING
 2013-07-07 13:39:35,512 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Adding container_1373184544832_0001_01_01 to application 
 application_1373184544832_0001
 2013-07-07 13:39:35,518 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1373184544832_0001 transitioned from INITING to 
 RUNNING
 2013-07-07 13:39:35,528 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1373184544832_0001_01_01 transitioned from NEW to 
 LOCALIZING
 2013-07-07 13:39:35,540 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource hdfs://localhost:9000/application/test.jar transitioned from INIT 
 to DOWNLOADING
 2013-07-07 13:39:35,540 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1373184544832_0001_01_01
 2013-07-07 13:39:35,675 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Writing credentials to the nmPrivate file 
 /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens.
  Credentials list: 
 2013-07-07 13:39:35,694 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user sunny
 2013-07-07 13:39:35,803 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
 from 
 /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens
  to 
 /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_01.tokens
 2013-07-07 13:39:35,803 INFO

[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion


[ 
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712889#comment-13712889
 ] 

Omkar Vinit Joshi commented on YARN-903:


Attaching a simple test to verify this.

 DistributedShell throwing Errors in logs after successfull completion
 -

 Key: YARN-903
 URL: https://issues.apache.org/jira/browse/YARN-903
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 2.0.4-alpha
 Environment: Ununtu 11.10
Reporter: Abhishek Kapoor
Assignee: Omkar Vinit Joshi
 Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, 
 YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log


 I have tried running DistributedShell and also used ApplicationMaster of the 
 same for my test.
 The application is successfully running through logging some errors which 
 would be useful to fix.
 Below are the logs from NodeManager and ApplicationMasterode
 Log Snippet for NodeManager
 =
 2013-07-07 13:39:18,787 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting 
 to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
 2013-07-07 13:39:19,050 INFO 
 org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
  Rolling master-key for container-tokens, got key with id -325382586
 2013-07-07 13:39:19,052 INFO 
 org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: 
 Rolling master-key for nm-tokens, got key with id :1005046570
 2013-07-07 13:39:19,053 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered 
 with ResourceManager as sunny-Inspiron:9993 with total resource of 
 memory:10240, vCores:8
 2013-07-07 13:39:19,053 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying 
 ContainerManager to unblock new container-requests
 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: 
 Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
 2013-07-07 13:39:35,492 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Start request for container_1373184544832_0001_01_01 by user sunny
 2013-07-07 13:39:35,507 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Creating a new application reference for app application_1373184544832_0001
 2013-07-07 13:39:35,511 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny  
 IP=127.0.0.1OPERATION=Start Container Request   
 TARGET=ContainerManageImpl  RESULT=SUCCESS  
 APPID=application_1373184544832_0001
 CONTAINERID=container_1373184544832_0001_01_01
 2013-07-07 13:39:35,511 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1373184544832_0001 transitioned from NEW to INITING
 2013-07-07 13:39:35,512 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Adding container_1373184544832_0001_01_01 to application 
 application_1373184544832_0001
 2013-07-07 13:39:35,518 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1373184544832_0001 transitioned from INITING to 
 RUNNING
 2013-07-07 13:39:35,528 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1373184544832_0001_01_01 transitioned from NEW to 
 LOCALIZING
 2013-07-07 13:39:35,540 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource hdfs://localhost:9000/application/test.jar transitioned from INIT 
 to DOWNLOADING
 2013-07-07 13:39:35,540 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1373184544832_0001_01_01
 2013-07-07 13:39:35,675 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Writing credentials to the nmPrivate file 
 /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens.
  Credentials list: 
 2013-07-07 13:39:35,694 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user sunny
 2013-07-07 13:39:35,803 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
 from 
 /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens
  to

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712902#comment-13712902
 ] 

Hadoop QA commented on YARN-353:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593045/YARN-353.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1520//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/1520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1520//console

This message is automatically generated.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
 YARN-353.8.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution


 [ 
https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-880:
--

Assignee: Omkar Vinit Joshi

 Configuring map/reduce memory equal to nodemanager's memory, hangs the job 
 execution
 

 Key: YARN-880
 URL: https://issues.apache.org/jira/browse/YARN-880
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Nishan Shetty
Assignee: Omkar Vinit Joshi
Priority: Critical

 Scenario:
 =
 Cluster is installed with 2 Nodemanagers 
 Configuraiton:
 NM memory (yarn.nodemanager.resource.memory-mb): 8 gb
 map and reduce memory : 8 gb
 Appmaster memory: 2 gb
 If map task is reserved on the same nodemanager where appmaster of the same 
 job is running then job execution hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712931#comment-13712931
 ] 

Karthik Kambatla commented on YARN-353:
---

For the findbugs warning around NUM_RETRIES, we should probably make it 
non-static numRetries.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, 
 YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, 
 YARN-353.8.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion

[
https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712932#comment-13712932
]

Hadoop QA commented on YARN-903:

{color:green}+1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12593065/YARN-903-20130718.1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 1 new
or modified test files.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-YARN-Build/1522//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1522//console

This message is automatically generated.

DistributedShell throwing Errors in logs after successfull completion
-

Key: YARN-903
URL: https://issues.apache.org/jira/browse/YARN-903
Project: Hadoop YARN
Issue Type: Bug
Components: applications/distributed-shell
Affects Versions: 2.0.4-alpha
Environment: Ununtu 11.10
Reporter: Abhishek Kapoor
Assignee: Omkar Vinit Joshi
Attachments: AppMaster.stderr, YARN-903-20130717.1.patch,
YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log

I have tried running DistributedShell and also used ApplicationMaster of the
same for my test.
The application is successfully running through logging some errors which
would be useful to fix.
Below are the logs from NodeManager and ApplicationMasterode
Log Snippet for NodeManager
=
2013-07-07 13:39:18,787 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting
to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1
2013-07-07 13:39:19,050 INFO
org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager:
Rolling master-key for container-tokens, got key with id -325382586
2013-07-07 13:39:19,052 INFO
org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM:
Rolling master-key for nm-tokens, got key with id :1005046570
2013-07-07 13:39:19,053 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered
with ResourceManager as sunny-Inspiron:9993 with total resource of
memory:10240, vCores:8
2013-07-07 13:39:19,053 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying
ContainerManager to unblock new container-requests
2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server:
Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE)
2013-07-07 13:39:35,492 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1373184544832_0001_01_01 by user sunny
2013-07-07 13:39:35,507 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Creating a new application reference for app application_1373184544832_0001
2013-07-07 13:39:35,511 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny
IP=127.0.0.1OPERATION=Start Container Request
TARGET=ContainerManageImpl RESULT=SUCCESS
APPID=application_1373184544832_0001
CONTAINERID=container_1373184544832_0001_01_01
2013-07-07 13:39:35,511 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1373184544832_0001 transitioned from NEW to INITING
2013-07-07 13:39:35,512 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1373184544832_0001_01_01 to application
application_1373184544832_0001
2013-07-07 13:39:35,518 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1373184544832_0001 transitioned from

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)

[
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712968#comment-13712968
]

Xuan Gong commented on YARN-873:

bq.Having a return statement in a catch/finally block is not recommended
normally. We could print the message and re-throw the exception or simply not
catch the exception. Also, this way the cmd line would exit with non-zero exit
code.

I still prefer to the way print out message, then exist instead of the way
print out message then throw exception or not catch the exception in this
scenario. If we re-throw exception or not catch exception, it will make no
different between we throw YarnException at
YARNClient.getApplicationReport(unknownAppId).
If the user get the Exception, that means they need to check and debug whether
there is anything wrong. For this case, if the users give the unknown
application_id, they will get the message, and this is the expected action.

YARNClient.getApplicationReport(unknownAppId) returns a null report
---

Key: YARN-873
URL: https://issues.apache.org/jira/browse/YARN-873
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch

How can the client find out that app does not exist?

[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701


[ 
https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712969#comment-13712969
 ] 

Vinod Kumar Vavilapalli commented on YARN-918:
--

Checking.. This passes on my local machine. Jenkins is complaining about port 
issues. Will retrigger it and at the same time run all tests locally..

 ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload 
 after YARN-701
 -

 Key: YARN-918
 URL: https://issues.apache.org/jira/browse/YARN-918
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt


 Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need 
 ApplicationAttemptId in the RPC pay load. This is an API change, so doing it 
 as a blocker for 2.1.0-beta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712985#comment-13712985
 ] 

Xuan Gong commented on YARN-873:


bq.Requiring to parse a string message to determine whether an application 
exists or not is more work as compared to checking $? which can be used to 
indicate various errors such as connection issue/invalid application id/app 
does not exist in RM.

Yes, but here we indicate different errors based on the different exceptions 
that we catch, such as ApplicationNotFoundException.

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report

2013-07-18 Thread Hitesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712994#comment-13712994
 ] 

Hitesh Shah commented on YARN-873:
--

[~xgong] When you say different errors, are you implying different error 
messages or different exit codes? For anyone building a script-based tool on 
this api, the latter would be preferred.

 YARNClient.getApplicationReport(unknownAppId) returns a null report
 ---

 Key: YARN-873
 URL: https://issues.apache.org/jira/browse/YARN-873
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch


 How can the client find out that app does not exist?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution


[ 
https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713006#comment-13713006
 ] 

Omkar Vinit Joshi commented on YARN-880:


Can you please provide below information to help debug the issue?
* RM / NM / AM logs (please enable debug).
* yarn-site.xml and mapred-site.xml files used.
* Which scheduler you are using?


 Configuring map/reduce memory equal to nodemanager's memory, hangs the job 
 execution
 

 Key: YARN-880
 URL: https://issues.apache.org/jira/browse/YARN-880
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.1-alpha
Reporter: Nishan Shetty
Assignee: Omkar Vinit Joshi
Priority: Critical

 Scenario:
 =
 Cluster is installed with 2 Nodemanagers 
 Configuraiton:
 NM memory (yarn.nodemanager.resource.memory-mb): 8 gb
 map and reduce memory : 8 gb
 Appmaster memory: 2 gb
 If map task is reserved on the same nodemanager where appmaster of the same 
 job is running then job execution hangs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

[
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713206#comment-13713206
]

Omkar Vinit Joshi commented on YARN-744:

[~bikassaha] yes... there is similar but different bug though..so
[~mayank_bansal] is fixing it. There we are computing the response and then
updating RMNodeImpl asynchronously. If this approach is correct then we can do
the similar thing after YARN-245 is in.

Race condition in ApplicationMasterService.allocate .. It might process same
allocate request twice resulting in additional containers getting allocated.
-

Key: YARN-744
URL: https://issues.apache.org/jira/browse/YARN-744
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
Attachments: MAPREDUCE-3899-branch-0.23.patch,
YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch

Looks like the lock taken in this is broken. It takes a lock on lastResponse
object and then puts a new lastResponse object into the map. At this point a
new thread entering this function will get a new lastResponse object and will
be able to take its lock and enter the critical section. Presumably we want
to limit one response per app attempt. So the lock could be taken on the
ApplicationAttemptId key of the response map object.

[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env

2013-07-18 Thread Hitesh Shah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-919:
-

Attachment: YARN.919.4.patch

[~mayank_bansal] Thanks for addressing the review comments. Looking at bin/yarn 
in a bit more detail, I noticed a couple of other gotchas that can affect users 
when setting heapsize. In that aspect, I have attached a patch with a bit more 
verbose documentation. 

Let me know what you think. 

 Setting default heap sizes in yarn env
 --

 Key: YARN-919
 URL: https://issues.apache.org/jira/browse/YARN-919
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Mayank Bansal
Assignee: Mayank Bansal
Priority: Minor
 Attachments: YARN.919.4.patch, YARN-919-trunk-1.patch, 
 YARN-919-trunk-2.patch, YARN-919-trunk-3.patch


 Right now there are no defaults in yarn env scripts for resource manager nad 
 node manager and if user wants to override that, then user has to go to 
 documentation and find the variables and change the script.
 There is no straight forward way to change it in script. Just updating the 
 variables with defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues


 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-897:
---

Target Version/s: 2.1.0-beta

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Djellel Eddine Difallah
Priority: Blocker
 Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
 YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues


 [ 
https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-897:
---

Priority: Blocker  (was: Major)

 CapacityScheduler wrongly sorted queues
 ---

 Key: YARN-897
 URL: https://issues.apache.org/jira/browse/YARN-897
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Djellel Eddine Difallah
Priority: Blocker
 Attachments: TestBugParentQueue.java, YARN-897-1.patch, 
 YARN-897-2.patch


 The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity 
 defines the sort order. This ensures the queue with least UsedCapacity to 
 receive resources next. On containerAssignment we correctly update the order, 
 but we miss to do so on container completions. This corrupts the TreeSet 
 structure, and under-capacity queues might starve for resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed

2013-07-18 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713234#comment-13713234
 ] 

Devaraj K commented on YARN-403:


Sorry Omkar, I don't have logs for this now, I have faced it once during long 
runs when the load was high on the cluster. During that time the fetch request 
from the Reducer got failed due to this error.

We could wait for sometime If I get any further info I will update, or if any 
others face this they could also help to check this further. 

 Node Manager throws java.io.IOException: Verification of the hashReply failed
 -

 Key: YARN-403
 URL: https://issues.apache.org/jira/browse/YARN-403
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.2-alpha, 0.23.6
Reporter: Devaraj K
Assignee: Omkar Vinit Joshi

 {code:xml}
 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle 
 failure 
 java.io.IOException: Verification of the hashReply failed
   at 
 org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436)
   at 
 org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
   at 
 org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
   at 
 org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506)
   at 
 org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443)
   at 
 org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545)
   at 
 org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
   at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
   at 
 org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
   at 
 org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
   at 
 org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses


[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713280#comment-13713280
 ] 

Omkar Vinit Joshi commented on YARN-245:


Thanks [~mayank_bansal] for the patch.. I agree that checking heartbeat ids 
will test this issue... few comments..
{code}
+  conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true);
{code}

why are we doing this?

{code}
+  NodeStatus nodeStatus = request.getNodeStatus();
+  nodeStatus.setResponseId(heartBeatID++);
{code}
required? can be removed?

* There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if 
we get such a heartbeat then we will not wait but try again.. check finally 
code {} which won't get executed. and will keep pinging RM until we get 
correct response with response-id. Should we wait or immediately request? 
thoughts?

{code}
+Thread.sleep(1000l);
{code}
can we make it 1000? .. 

* test will need timeout. however I see there are certain tests without 
timeout... if adding timeout then add little larger value... :) 

{code}
+  if (nodeStatus.getKeepAliveApplications() != null
+   nodeStatus.getKeepAliveApplications().size()  0) {
+for (ApplicationId appId : nodeStatus.getKeepAliveApplications()) {
+  ListLong list = keepAliveRequests.get(appId);
+  if (list == null) {
+list = new LinkedListLong();
+keepAliveRequests.put(appId, list);
+  }
+  list.add(System.currentTimeMillis());
+}
+  }
{code}
{code}
+  if (heartBeatID == 2) {
+LOG.info(Sending FINISH_APP for application: [ + appId + ]);
+this.context.getApplications().put(appId, mock(Application.class));
+
nhResponse.addAllApplicationsToCleanup(Collections.singletonList(appId));
+  }
{code}

{code}
+  rt.context.getApplications().remove(rt.appId);
{code} 

{code}
+private MapApplicationId, ListLong keepAliveRequests =
+new HashMapApplicationId, ListLong();
+private ApplicationId appId = BuilderUtils.newApplicationId(1, 1);
{code}

do we need this? can we remove all application related stuff? as we are now 
checking only heartbeat ids..we can remove this.. thoughts?

 Node Manager can not handle duplicate responses
 ---

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
 YARN-245-trunk-3.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses


[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713282#comment-13713282
 ] 

Omkar Vinit Joshi commented on YARN-245:


ignore thread comment ...realized it is L not 1 :D


 Node Manager can not handle duplicate responses
 ---

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, 
 YARN-245-trunk-3.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster


[ 
https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713285#comment-13713285
 ] 

Omkar Vinit Joshi commented on YARN-685:


[~raviprak] can you please tell me what is value-1 and value-2?? I think 
first one is nodes..what is second?
also what do you mean here?
{code}
For 23, Reduce: 
2 1
32 2
1 4
{code}

 Capacity Scheduler is not distributing the reducers tasks across the cluster
 

 Key: YARN-685
 URL: https://issues.apache.org/jira/browse/YARN-685
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.0.4-alpha
Reporter: Devaraj K

 If we have reducers whose total memory required to complete is less than the 
 total cluster memory, it is not assigning the reducers to all the nodes 
 uniformly(~uniformly). Also at that time there are no other jobs or job tasks 
 running in the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command


[ 
https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713311#comment-13713311
 ] 

Hadoop QA commented on YARN-853:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12593117/YARN-853-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1528//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1528//console

This message is automatically generated.

 maximum-am-resource-percent doesn't work after refreshQueues command
 

 Key: YARN-853
 URL: https://issues.apache.org/jira/browse/YARN-853
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, 
 YARN-853.patch


 If we update yarn.scheduler.capacity.maximum-am-resource-percent / 
 yarn.scheduler.capacity.queue-path.maximum-am-resource-percent 
 configuration and then do the refreshNodes, it uses the new config value to 
 calculate Max Active Applications and Max Active Application Per User. If we 
 add new node after issuing  'rmadmin -refreshQueues' command, it uses the old 
 maximum-am-resource-percent config value to calculate Max Active Applications 
 and Max Active Application Per User. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception