[jira] [Created] (YARN-935) Correcting pom.xml to build applicationhistoryserver sub-project successfully
Zhijie Shen created YARN-935: Summary: Correcting pom.xml to build applicationhistoryserver sub-project successfully Key: YARN-935 URL: https://issues.apache.org/jira/browse/YARN-935 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen The branch was created from branch-2, hadoop-yarn-server-applicationhistoryserver/pom.xml should use 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be built correctly because of wrong dependency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-935) Correcting pom.xml to build applicationhistoryserver sub-project successfully
[ https://issues.apache.org/jira/browse/YARN-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-935: - Attachment: YARN-935.1.patch Changed the version in pom.xml to 2.2.0-SNAPSHOT, and did some minor touch according to pom.xml of other server sub-projects. Correcting pom.xml to build applicationhistoryserver sub-project successfully - Key: YARN-935 URL: https://issues.apache.org/jira/browse/YARN-935 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-935.1.patch The branch was created from branch-2, hadoop-yarn-server-applicationhistoryserver/pom.xml should use 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be built correctly because of wrong dependency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-696) Enable multiple states to to be specified in Resource Manager apps REST call
[ https://issues.apache.org/jira/browse/YARN-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reassigned YARN-696: Assignee: Trevor Lorimer Enable multiple states to to be specified in Resource Manager apps REST call Key: YARN-696 URL: https://issues.apache.org/jira/browse/YARN-696 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.0.4-alpha Reporter: Trevor Lorimer Assignee: Trevor Lorimer Priority: Trivial Attachments: 0001-YARN-696.patch Within the YARN Resource Manager REST API the GET call which returns all Applications can be filtered by a single State query parameter (http://rm http address:port/ws/v1/cluster/apps). There are 8 possible states (New, Submitted, Accepted, Running, Finishing, Finished, Failed, Killed), if no state parameter is specified all states are returned, however if a sub-set of states is required then multiple REST calls are required (max. of 7). The proposal is to be able to specify multiple states in a single REST call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-936) RMWebServices filtering apps by states uses RMAppState instead of YarnAppilcationState.
Vinod Kumar Vavilapalli created YARN-936: Summary: RMWebServices filtering apps by states uses RMAppState instead of YarnAppilcationState. Key: YARN-936 URL: https://issues.apache.org/jira/browse/YARN-936 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Realized this while reviewing YARN-696. YarnApplicationState is the end user API and one that users expect to pass as argument to the REST API. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-935) Correcting pom.xml to build applicationhistoryserver sub-project successfully
[ https://issues.apache.org/jira/browse/YARN-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712089#comment-13712089 ] Hadoop QA commented on YARN-935: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592927/YARN-935.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1516//console This message is automatically generated. Correcting pom.xml to build applicationhistoryserver sub-project successfully - Key: YARN-935 URL: https://issues.apache.org/jira/browse/YARN-935 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-935.1.patch The branch was created from branch-2, hadoop-yarn-server-applicationhistoryserver/pom.xml should use 2.2.0-SNAPSHOT, not 3.0.0-SNAPSHOT. Otherwise, the sub-project cannot be built correctly because of wrong dependency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-933: Description: Hostname enabled. am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=host-10-18-40-15/10.18.40.59:8020] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation:
[jira] [Updated] (YARN-933) After an AppAttempt_1 got failed [ removal and releasing of container is done , AppAttempt_2 is scheduled ] again relaunching of AppAttempt_1 throws Exception at RM .And cl
[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated YARN-933: Description: am max retries configured as 3 at client and RM side. Step 1: Install cluster with NM on 2 Machines Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But using Hostname should fail Step 3: Execute a job Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , connection loss happened. Observation : == After AppAttempt_1 has moved to failed state ,release of container for AppAttempt_1 and Application removal are successful. New AppAttempt_2 is sponed. 1. Then again retry for AppAttempt_1 happens. 2. Again RM side it is trying to launch AppAttempt_1, hence fails with InvalidStateTransitonException 3. Client got exited after AppAttempt_1 is been finished [But actually job is still running ], while the appattempts configured is 3 and rest appattempts are all sponed and running. RMLogs: == 2013-07-17 16:22:51,013 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_01 State change from SCHEDULED to ALLOCATED 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); maxRetries=45 2013-07-17 16:36:07,091 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:container_1373952096466_0056_01_01 Timed out after 600 secs 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1373952096466_0056_01_01 Container Transitioned from ACQUIRED to EXPIRED 2013-07-17 16:36:07,093 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering appattempt_1373952096466_0056_02 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1373952096466_0056_01 is done. finalState=FAILED 2013-07-17 16:36:07,131 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1373952096466_0056 user: Rex leaf-queue of parent: root #applications: 35 2013-07-17 16:36:07,132 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Submission: appattempt_1373952096466_0056_02, 2013-07-17 16:36:07,138 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1373952096466_0056_02 State change from SUBMITTED to SCHEDULED 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); maxRetries=45 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); maxRetries=45 2013-07-17 16:38:56,207 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1373952096466_0056_01. Got exception: java.lang.reflect.UndeclaredThrowableException 2013-07-17 16:38:56,207 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) Client Logs Caused by: org.apache.hadoop.net.ConnectTimeoutException: 2 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=host-10-18-40-15/10.18.40.59:8020] at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) 2013-07-17 16:37:05,987 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712197#comment-13712197 ] Devaraj K commented on YARN-873: The latest patch overall looks good to me. These are things I feel which we can take care, 1. Application -status command returns exit code as 0 when the application doesn't exist. Can we return the non-zero exit status code when the application doesn't exist? 2. In TestClientRMService.java, {code:xml} +try { + GetApplicationReportResponse applicationReport = rmService + .getApplicationReport(request); +} catch (ApplicationNotFoundException ex) { + getExpectedException = true; + Assert.assertEquals(ex.getMessage(), + Application with id ' + request.getApplicationId() + + ' doesn't exist in RM.); +} +Assert.assertTrue(getExpectedException); {code} Can we fail after getApplicationReport using Assert.fail() instead of having boolean flag and checking. And also applicationReport variable is never used. 3. {code:xml} // If the RM doesn't have the application, provide the response with // application report as null and let the clients to handle. {code} Do we have other JIRA to fix the same for kill application, if not can we file a JIRA? YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-865) RM webservices can't query based on application Types
[ https://issues.apache.org/jira/browse/YARN-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712216#comment-13712216 ] Hudson commented on YARN-865: - SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/274/]) YARN-865. RM webservices can't query based on application Types. Contributed by Xuan Gong. (hitesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1504288) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm RM webservices can't query based on application Types - Key: YARN-865 URL: https://issues.apache.org/jira/browse/YARN-865 Project: Hadoop YARN Issue Type: Improvement Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.1.0-beta Attachments: MR-5337.1.patch, YARN-865.1.patch, YARN-865.2.patch, YARN-865.3.patch, YARN-865.4.patch, YARN-865.5.patch, YARN-865.6.patch The resource manager web service api to get the list of apps doesn't have a query parameter for appTypes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-922) Change FileSystemRMStateStore to use directories
[ https://issues.apache.org/jira/browse/YARN-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712215#comment-13712215 ] Hudson commented on YARN-922: - SUCCESS: Integrated in Hadoop-Yarn-trunk #274 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/274/]) YARN-922. Change FileSystemRMStateStore to use directories (Jian He via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1504261) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestRMStateStore.java Change FileSystemRMStateStore to use directories Key: YARN-922 URL: https://issues.apache.org/jira/browse/YARN-922 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-922.1.patch, YARN-922.2.patch, YARN-922.3.patch, YARN-922.patch Store each app and its attempts in the same directory so that removing application state is only one operation -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-937) Fix unmanaged AM in non-secure/secure setup post YARN-701
Arun C Murthy created YARN-937: -- Summary: Fix unmanaged AM in non-secure/secure setup post YARN-701 Key: YARN-937 URL: https://issues.apache.org/jira/browse/YARN-937 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Arun C Murthy Assignee: Alejandro Abdelnur Priority: Blocker Fix For: 2.1.0-beta Fix unmanaged AM in non-secure/secure setup post YARN-701 since app-tokens will be used in both scenarios. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712507#comment-13712507 ] Hitesh Shah commented on YARN-873: -- [~xgong] Requiring to parse a string message to determine whether an application exists or not is more work as compared to checking $? which can be used to indicate various errors such as connection issue/invalid application id/app does not exist in RM. YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712514#comment-13712514 ] Hadoop QA commented on YARN-873: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592998/YARN-873.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1517//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1517//console This message is automatically generated. YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712523#comment-13712523 ] Bikas Saha commented on YARN-353: - bq. ZKRMStateStore#getNewZooKeeper need not be synchronized bq. fixed The code is derived from ActiveStandyLeaderElector code in hadoop common. It was synchronized there for a race condition that showed up in testing. I would like to keep the synchronization as it was in the original patch. bq. the patch still seems to have NUM_RETRIES Why should NUM_RETRIES not be there? Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712528#comment-13712528 ] Bikas Saha commented on YARN-873: - Having a return statement in a catch/finally block is not recommended normally. We could print the message and re-throw the exception or simply not catch the exception. Also, this way the cmd line would exit with non-zero exit code. {code} +} catch (ApplicationNotFoundException ex) { + sysout.println(Application with id ' + + applicationId + ' doesn't exist in RM.); + return; +} {code} YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712533#comment-13712533 ] Karthik Kambatla commented on YARN-353: --- bq. Make the ZKRMStateStore#NUM_RETRIES configurable with default set to 3. bq. fixed bq. Why should NUM_RETRIES not be there? Was just noting that: the latest patch has the non-configurable NUM_RETRIES, it should exist but be configurable. If it is configurable, we should probably change the name of the variable. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712546#comment-13712546 ] Karthik Kambatla commented on YARN-321: --- bq. Folks, it would be great if we have a consolidated document that describes the design and some details. +1 Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli Attachments: HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-938) Hadoop 2 Bench marking
Mayank Bansal created YARN-938: -- Summary: Hadoop 2 Bench marking Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-814: - Attachment: YARN-814.7.patch new patch fixed the warnings and added test case for stdout/stderr diagnostics Difficult to diagnose a failed container launch when error due to invalid environment variable -- Key: YARN-814 URL: https://issues.apache.org/jira/browse/YARN-814 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, YARN-814.patch The container's launch script sets up environment variables, symlinks etc. If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. To reproduce, set an env var where the value contains characters that throw syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-245: --- Summary: Node Manager can not handle duplicate responses (was: Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED) Node Manager can not handle duplicate responses --- Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable
[ https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712607#comment-13712607 ] Hadoop QA commented on YARN-814: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593018/YARN-814.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1518//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1518//console This message is automatically generated. Difficult to diagnose a failed container launch when error due to invalid environment variable -- Key: YARN-814 URL: https://issues.apache.org/jira/browse/YARN-814 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Jian He Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, YARN-814.4.patch, YARN-814.5.patch, YARN-814.6.patch, YARN-814.7.patch, YARN-814.patch The container's launch script sets up environment variables, symlinks etc. If there is any failure when setting up the basic context ( before the actual user's process is launched ), nothing is captured by the NM. This makes it impossible to diagnose the reason for the failure. To reproduce, set an env var where the value contains characters that throw syntax errors in bash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712661#comment-13712661 ] Omkar Vinit Joshi commented on YARN-713: [~maysamyabandeh] are you working on a patch? Or else I will take over..this is critical and needs to be fixed. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Maysam Yabandeh Priority: Critical Fix For: 2.1.0-beta Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712665#comment-13712665 ] Arun C Murthy commented on YARN-701: I'm committing this to unblock the rest. ApplicationTokens should be used irrespective of kerberos - Key: YARN-701 URL: https://issues.apache.org/jira/browse/YARN-701 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, yarn-ojoshi-resourcemanager-HW10351.local.log - Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-208) Yarn overrides diagnostic message set by AM
[ https://issues.apache.org/jira/browse/YARN-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-208. -- Resolution: Duplicate Thomas, closing this as duplicate, please reopen if you see it again. Tx. Yarn overrides diagnostic message set by AM --- Key: YARN-208 URL: https://issues.apache.org/jira/browse/YARN-208 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Thomas Weise Diagnostics set in the AM just before exit overridden by Yarn. In the case of state FAILED with different message, for SUCCESS the field will be blank. Should retain application info. Per FinishApplicationMasterRequest this can be managed by ApplicationMaster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-455) NM warns about stopping an unknown container under normal circumstances
[ https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi resolved YARN-455. Resolution: Duplicate NM warns about stopping an unknown container under normal circumstances --- Key: YARN-455 URL: https://issues.apache.org/jira/browse/YARN-455 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi During normal operations the NM can log warnings to its audit log about unknown containers. For example: {noformat} 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser IP=xx OPERATION=Stop Container RequestTARGET=ContainerManagerImpl RESULT=FAILURE DESCRIPTION=Trying to stop unknown container! APPID=application_1359150825713_3947178 CONTAINERID=container_1359150825713_3947178_01_001266 {noformat} Looking closer at the audit log and the NM log shows that the container completed successfully and was forgotten by the NM before the stop request arrived. The NM should avoid warning in these situations since this is a normal race condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-455) NM warns about stopping an unknown container under normal circumstances
[ https://issues.apache.org/jira/browse/YARN-455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712677#comment-13712677 ] Omkar Vinit Joshi commented on YARN-455: Closing this as duplicate . I am fixing it at YARN-903 NM warns about stopping an unknown container under normal circumstances --- Key: YARN-455 URL: https://issues.apache.org/jira/browse/YARN-455 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.3-alpha, 0.23.6 Reporter: Jason Lowe Assignee: Omkar Vinit Joshi During normal operations the NM can log warnings to its audit log about unknown containers. For example: {noformat} 2013-03-06 21:04:55,327 WARN nodemanager.NMAuditLogger: USER=UnknownUser IP=xx OPERATION=Stop Container RequestTARGET=ContainerManagerImpl RESULT=FAILURE DESCRIPTION=Trying to stop unknown container! APPID=application_1359150825713_3947178 CONTAINERID=container_1359150825713_3947178_01_001266 {noformat} Looking closer at the audit log and the NM log shows that the container completed successfully and was forgotten by the NM before the stop request arrived. The NM should avoid warning in these situations since this is a normal race condition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions
[ https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712688#comment-13712688 ] Vinod Kumar Vavilapalli commented on YARN-658: -- David, can you give us more information? RM, AM and NM logs will help a lot. Command to kill a YARN application does not work with newer Ubuntu versions --- Key: YARN-658 URL: https://issues.apache.org/jira/browse/YARN-658 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.3-alpha, 2.0.4-alpha Reporter: David Yan After issuing a KillApplicationRequest, the application keeps running on the system even though the state is changed to KILLED. It happens on both Ubuntu 12.10 and 13.04, but works fine on Ubuntu 12.04. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-864) YARN NM leaking containers with CGroups
[ https://issues.apache.org/jira/browse/YARN-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-864. -- Resolution: Duplicate Given Jian's update, I'm closing this as duplicate of YARN-688. YARN NM leaking containers with CGroups --- Key: YARN-864 URL: https://issues.apache.org/jira/browse/YARN-864 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.5-alpha Environment: YARN 2.0.5-alpha with patches applied for YARN-799 and YARN-600. Reporter: Chris Riccomini Assignee: Jian He Attachments: rm-log, YARN-864.1.patch, YARN-864.2.patch Hey Guys, I'm running YARN 2.0.5-alpha with CGroups and stateful RM turned on, and I'm seeing containers getting leaked by the NMs. I'm not quite sure what's going on -- has anyone seen this before? I'm concerned that maybe it's a mis-understanding on my part about how YARN's lifecycle works. When I look in my AM logs for my app (not an MR app master), I see: 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Got an exit code of -100. This means that container container_1371141151815_0008_03_02 was killed by YARN, either due to being released by the application master or being 'lost' due to node failures etc. 2013-06-19 05:34:22 AppMasterTaskManager [INFO] Released container container_1371141151815_0008_03_02 was assigned task ID 0. Requesting a new container for the task. The AM has been running steadily the whole time. Here's what the NM logs say: {noformat} 05:34:59,783 WARN AsyncDispatcher:109 - Interrupted Exception while stopping java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1143) at java.lang.Thread.join(Thread.java:1196) at org.apache.hadoop.yarn.event.AsyncDispatcher.stop(AsyncDispatcher.java:107) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:209) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:336) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.handle(NodeManager.java:61) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:619) 05:35:00,314 WARN ContainersMonitorImpl:463 - org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting. 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0006_01_001598 05:35:00,434 WARN CgroupsLCEResourcesHandler:166 - Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_1371141151815_0008_03_02 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) at org.apache.hadoop.util.Shell.run(Shell.java:129) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 05:35:00,434 WARN ContainerLaunch:247 - Failed to launch container. java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:205) at org.apache.hadoop.util.Shell.run(Shell.java:129) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:322) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:230) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
[jira] [Commented] (YARN-701) ApplicationTokens should be used irrespective of kerberos
[ https://issues.apache.org/jira/browse/YARN-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712698#comment-13712698 ] Hudson commented on YARN-701: - SUCCESS: Integrated in Hadoop-trunk-Commit #4110 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4110/]) YARN-701. Use application tokens irrespective of secure or non-secure mode. Contributed by Vinod K V. (acmurthy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1504604) * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher/src/test/java/org/apache/hadoop/yarn/applications/unmanagedamlauncher/TestUnmanagedAMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAMAuthorization.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCNodeUpdates.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java ApplicationTokens should be used irrespective of kerberos - Key: YARN-701 URL: https://issues.apache.org/jira/browse/YARN-701 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 2.1.0-beta Attachments: YARN-701-20130520.txt, YARN-701-20130709.3.txt, YARN-701-20130710.txt, YARN-701-20130712.txt, YARN-701-20130717.txt, yarn-ojoshi-resourcemanager-HW10351.local.log - Single code path for secure and non-secure cases is useful for testing, coverage. - Having this in non-secure mode will help us avoid accidental bugs in AMs DDos'ing and bringing down RM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712712#comment-13712712 ] Vinod Kumar Vavilapalli commented on YARN-938: -- Thanks for doing this Mayank! Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-938) Hadoop 2 benchmarking
[ https://issues.apache.org/jira/browse/YARN-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-938: - Summary: Hadoop 2 benchmarking (was: Hadoop 2 Bench marking ) Hadoop 2 benchmarking -- Key: YARN-938 URL: https://issues.apache.org/jira/browse/YARN-938 Project: Hadoop YARN Issue Type: Task Reporter: Mayank Bansal Assignee: Mayank Bansal I am running the benchmarks on Hadoop 2 and will update the results soon. Thanks, Mayank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-353: - Attachment: YARN-353.8.patch New patch made NUM_RETRIES configurable. Changed removeApplicationState to use multi api to remove both app state and attempts state at the same time. Also fixed the warnings. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712793#comment-13712793 ] Karthik Kambatla commented on YARN-353: --- Looks good. +1 pending Jenkins. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated YARN-919: --- Attachment: YARN-919-trunk-3.patch Thanks [~hitesh] and [~vinodkv] for review Updating the patch Thanks, Mayank Setting default heap sizes in yarn env -- Key: YARN-919 URL: https://issues.apache.org/jira/browse/YARN-919 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Mayank Bansal Assignee: Mayank Bansal Priority: Minor Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, YARN-919-trunk-3.patch Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. There is no straight forward way to change it in script. Just updating the variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712796#comment-13712796 ] Zhijie Shen commented on YARN-321: -- bq. Running as service: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. IIUC, to be independent, ApplicationHistoryService should have its own event dispatcher, shouldn't it? Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Assignee: Vinod Kumar Vavilapalli Attachments: HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712839#comment-13712839 ] Hadoop QA commented on YARN-919: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593050/YARN-919-trunk-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1519//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1519//console This message is automatically generated. Setting default heap sizes in yarn env -- Key: YARN-919 URL: https://issues.apache.org/jira/browse/YARN-919 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Mayank Bansal Assignee: Mayank Bansal Priority: Minor Attachments: YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, YARN-919-trunk-3.patch Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. There is no straight forward way to change it in script. Just updating the variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712882#comment-13712882 ] Hadoop QA commented on YARN-918: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12592818/YARN-918-20130717.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1521//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1521//console This message is automatically generated. ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701 - Key: YARN-918 URL: https://issues.apache.org/jira/browse/YARN-918 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated YARN-903: --- Attachment: YARN-903-20130718.1.patch DistributedShell throwing Errors in logs after successfull completion - Key: YARN-903 URL: https://issues.apache.org/jira/browse/YARN-903 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.4-alpha Environment: Ununtu 11.10 Reporter: Abhishek Kapoor Assignee: Omkar Vinit Joshi Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log I have tried running DistributedShell and also used ApplicationMaster of the same for my test. The application is successfully running through logging some errors which would be useful to fix. Below are the logs from NodeManager and ApplicationMasterode Log Snippet for NodeManager = 2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586 2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of memory:10240, vCores:8 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) 2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_01 by user sunny 2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_01 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING 2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_01 to application application_1373184544832_0001 2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING 2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_01 transitioned from NEW to LOCALIZING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_01 2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens. Credentials list: 2013-07-07 13:39:35,694 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user sunny 2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens to /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/usercache/sunny/appcache/application_1373184544832_0001/container_1373184544832_0001_01_01.tokens 2013-07-07 13:39:35,803 INFO
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712889#comment-13712889 ] Omkar Vinit Joshi commented on YARN-903: Attaching a simple test to verify this. DistributedShell throwing Errors in logs after successfull completion - Key: YARN-903 URL: https://issues.apache.org/jira/browse/YARN-903 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.4-alpha Environment: Ununtu 11.10 Reporter: Abhishek Kapoor Assignee: Omkar Vinit Joshi Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log I have tried running DistributedShell and also used ApplicationMaster of the same for my test. The application is successfully running through logging some errors which would be useful to fix. Below are the logs from NodeManager and ApplicationMasterode Log Snippet for NodeManager = 2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586 2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of memory:10240, vCores:8 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) 2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_01 by user sunny 2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_01 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING 2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_01 to application application_1373184544832_0001 2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from INITING to RUNNING 2013-07-07 13:39:35,528 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1373184544832_0001_01_01 transitioned from NEW to LOCALIZING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/application/test.jar transitioned from INIT to DOWNLOADING 2013-07-07 13:39:35,540 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1373184544832_0001_01_01 2013-07-07 13:39:35,675 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens. Credentials list: 2013-07-07 13:39:35,694 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user sunny 2013-07-07 13:39:35,803 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /home/sunny/Hadoop2/hadoopdata/nodemanagerdata/nmPrivate/container_1373184544832_0001_01_01.tokens to
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712902#comment-13712902 ] Hadoop QA commented on YARN-353: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593045/YARN-353.8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1520//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/1520//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1520//console This message is automatically generated. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
[ https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi reassigned YARN-880: -- Assignee: Omkar Vinit Joshi Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution Key: YARN-880 URL: https://issues.apache.org/jira/browse/YARN-880 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Assignee: Omkar Vinit Joshi Priority: Critical Scenario: = Cluster is installed with 2 Nodemanagers Configuraiton: NM memory (yarn.nodemanager.resource.memory-mb): 8 gb map and reduce memory : 8 gb Appmaster memory: 2 gb If map task is reserved on the same nodemanager where appmaster of the same job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712931#comment-13712931 ] Karthik Kambatla commented on YARN-353: --- For the findbugs warning around NUM_RETRIES, we should probably make it non-static numRetries. Add Zookeeper-based store implementation for RMStateStore - Key: YARN-353 URL: https://issues.apache.org/jira/browse/YARN-353 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Hitesh Shah Assignee: Bikas Saha Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-903) DistributedShell throwing Errors in logs after successfull completion
[ https://issues.apache.org/jira/browse/YARN-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712932#comment-13712932 ] Hadoop QA commented on YARN-903: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593065/YARN-903-20130718.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1522//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1522//console This message is automatically generated. DistributedShell throwing Errors in logs after successfull completion - Key: YARN-903 URL: https://issues.apache.org/jira/browse/YARN-903 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Affects Versions: 2.0.4-alpha Environment: Ununtu 11.10 Reporter: Abhishek Kapoor Assignee: Omkar Vinit Joshi Attachments: AppMaster.stderr, YARN-903-20130717.1.patch, YARN-903-20130718.1.patch, yarn-sunny-nodemanager-sunny-Inspiron.log I have tried running DistributedShell and also used ApplicationMaster of the same for my test. The application is successfully running through logging some errors which would be useful to fix. Below are the logs from NodeManager and ApplicationMasterode Log Snippet for NodeManager = 2013-07-07 13:39:18,787 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connecting to ResourceManager at localhost/127.0.0.1:9990. current no. of attempts is 1 2013-07-07 13:39:19,050 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -325382586 2013-07-07 13:39:19,052 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :1005046570 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as sunny-Inspiron:9993 with total resource of memory:10240, vCores:8 2013-07-07 13:39:19,053 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests 2013-07-07 13:39:35,256 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1373184544832_0001_01 (auth:SIMPLE) 2013-07-07 13:39:35,492 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1373184544832_0001_01_01 by user sunny 2013-07-07 13:39:35,507 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1373184544832_0001 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=sunny IP=127.0.0.1OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1373184544832_0001 CONTAINERID=container_1373184544832_0001_01_01 2013-07-07 13:39:35,511 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from NEW to INITING 2013-07-07 13:39:35,512 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1373184544832_0001_01_01 to application application_1373184544832_0001 2013-07-07 13:39:35,518 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1373184544832_0001 transitioned from
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712968#comment-13712968 ] Xuan Gong commented on YARN-873: bq.Having a return statement in a catch/finally block is not recommended normally. We could print the message and re-throw the exception or simply not catch the exception. Also, this way the cmd line would exit with non-zero exit code. I still prefer to the way print out message, then exist instead of the way print out message then throw exception or not catch the exception in this scenario. If we re-throw exception or not catch exception, it will make no different between we throw YarnException at YARNClient.getApplicationReport(unknownAppId). If the user get the Exception, that means they need to check and debug whether there is anything wrong. For this case, if the users give the unknown application_id, they will get the message, and this is the expected action. YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-918) ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701
[ https://issues.apache.org/jira/browse/YARN-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712969#comment-13712969 ] Vinod Kumar Vavilapalli commented on YARN-918: -- Checking.. This passes on my local machine. Jenkins is complaining about port issues. Will retrigger it and at the same time run all tests locally.. ApplicationMasterProtocol doesn't need ApplicationAttemptId in the payload after YARN-701 - Key: YARN-918 URL: https://issues.apache.org/jira/browse/YARN-918 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-918-20130715.txt, YARN-918-20130717.txt Once we use AMRMToken irrespective of kerberos after YARN-701, we don't need ApplicationAttemptId in the RPC pay load. This is an API change, so doing it as a blocker for 2.1.0-beta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712985#comment-13712985 ] Xuan Gong commented on YARN-873: bq.Requiring to parse a string message to determine whether an application exists or not is more work as compared to checking $? which can be used to indicate various errors such as connection issue/invalid application id/app does not exist in RM. Yes, but here we indicate different errors based on the different exceptions that we catch, such as ApplicationNotFoundException. YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-873) YARNClient.getApplicationReport(unknownAppId) returns a null report
[ https://issues.apache.org/jira/browse/YARN-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13712994#comment-13712994 ] Hitesh Shah commented on YARN-873: -- [~xgong] When you say different errors, are you implying different error messages or different exit codes? For anyone building a script-based tool on this api, the latter would be preferred. YARNClient.getApplicationReport(unknownAppId) returns a null report --- Key: YARN-873 URL: https://issues.apache.org/jira/browse/YARN-873 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-873.1.patch, YARN-873.2.patch, YARN-873.3.patch How can the client find out that app does not exist? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-880) Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution
[ https://issues.apache.org/jira/browse/YARN-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713006#comment-13713006 ] Omkar Vinit Joshi commented on YARN-880: Can you please provide below information to help debug the issue? * RM / NM / AM logs (please enable debug). * yarn-site.xml and mapred-site.xml files used. * Which scheduler you are using? Configuring map/reduce memory equal to nodemanager's memory, hangs the job execution Key: YARN-880 URL: https://issues.apache.org/jira/browse/YARN-880 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.1-alpha Reporter: Nishan Shetty Assignee: Omkar Vinit Joshi Priority: Critical Scenario: = Cluster is installed with 2 Nodemanagers Configuraiton: NM memory (yarn.nodemanager.resource.memory-mb): 8 gb map and reduce memory : 8 gb Appmaster memory: 2 gb If map task is reserved on the same nodemanager where appmaster of the same job is running then job execution hangs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
[ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713206#comment-13713206 ] Omkar Vinit Joshi commented on YARN-744: [~bikassaha] yes... there is similar but different bug though..so [~mayank_bansal] is fixing it. There we are computing the response and then updating RMNodeImpl asynchronously. If this approach is correct then we can do the similar thing after YARN-245 is in. Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated. - Key: YARN-744 URL: https://issues.apache.org/jira/browse/YARN-744 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Priority: Minor Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, YARN-744.patch Looks like the lock taken in this is broken. It takes a lock on lastResponse object and then puts a new lastResponse object into the map. At this point a new thread entering this function will get a new lastResponse object and will be able to take its lock and enter the critical section. Presumably we want to limit one response per app attempt. So the lock could be taken on the ApplicationAttemptId key of the response map object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-919) Setting default heap sizes in yarn env
[ https://issues.apache.org/jira/browse/YARN-919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-919: - Attachment: YARN.919.4.patch [~mayank_bansal] Thanks for addressing the review comments. Looking at bin/yarn in a bit more detail, I noticed a couple of other gotchas that can affect users when setting heapsize. In that aspect, I have attached a patch with a bit more verbose documentation. Let me know what you think. Setting default heap sizes in yarn env -- Key: YARN-919 URL: https://issues.apache.org/jira/browse/YARN-919 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Mayank Bansal Assignee: Mayank Bansal Priority: Minor Attachments: YARN.919.4.patch, YARN-919-trunk-1.patch, YARN-919-trunk-2.patch, YARN-919-trunk-3.patch Right now there are no defaults in yarn env scripts for resource manager nad node manager and if user wants to override that, then user has to go to documentation and find the variables and change the script. There is no straight forward way to change it in script. Just updating the variables with defaults. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-897: --- Target Version/s: 2.1.0-beta CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.4-alpha Reporter: Djellel Eddine Difallah Priority: Blocker Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-897) CapacityScheduler wrongly sorted queues
[ https://issues.apache.org/jira/browse/YARN-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated YARN-897: --- Priority: Blocker (was: Major) CapacityScheduler wrongly sorted queues --- Key: YARN-897 URL: https://issues.apache.org/jira/browse/YARN-897 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Djellel Eddine Difallah Priority: Blocker Attachments: TestBugParentQueue.java, YARN-897-1.patch, YARN-897-2.patch The childQueues of a ParentQueue are stored in a TreeSet where UsedCapacity defines the sort order. This ensures the queue with least UsedCapacity to receive resources next. On containerAssignment we correctly update the order, but we miss to do so on container completions. This corrupts the TreeSet structure, and under-capacity queues might starve for resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-403) Node Manager throws java.io.IOException: Verification of the hashReply failed
[ https://issues.apache.org/jira/browse/YARN-403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713234#comment-13713234 ] Devaraj K commented on YARN-403: Sorry Omkar, I don't have logs for this now, I have faced it once during long runs when the load was high on the cluster. During that time the fetch request from the Reducer got failed due to this error. We could wait for sometime If I get any further info I will update, or if any others face this they could also help to check this further. Node Manager throws java.io.IOException: Verification of the hashReply failed - Key: YARN-403 URL: https://issues.apache.org/jira/browse/YARN-403 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.0.2-alpha, 0.23.6 Reporter: Devaraj K Assignee: Omkar Vinit Joshi {code:xml} 2013-02-09 22:59:47,490 WARN org.apache.hadoop.mapred.ShuffleHandler: Shuffle failure java.io.IOException: Verification of the hashReply failed at org.apache.hadoop.mapreduce.security.SecureShuffleUtils.verifyReply(SecureShuffleUtils.java:98) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.verifyRequest(ShuffleHandler.java:436) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:383) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:754) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:545) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:540) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713280#comment-13713280 ] Omkar Vinit Joshi commented on YARN-245: Thanks [~mayank_bansal] for the patch.. I agree that checking heartbeat ids will test this issue... few comments.. {code} + conf.setBoolean(YarnConfiguration.LOG_AGGREGATION_ENABLED, true); {code} why are we doing this? {code} + NodeStatus nodeStatus = request.getNodeStatus(); + nodeStatus.setResponseId(heartBeatID++); {code} required? can be removed? * There is one issue at present with NodeStatusUpdaterImpl.java ...imagine if we get such a heartbeat then we will not wait but try again.. check finally code {} which won't get executed. and will keep pinging RM until we get correct response with response-id. Should we wait or immediately request? thoughts? {code} +Thread.sleep(1000l); {code} can we make it 1000? .. * test will need timeout. however I see there are certain tests without timeout... if adding timeout then add little larger value... :) {code} + if (nodeStatus.getKeepAliveApplications() != null + nodeStatus.getKeepAliveApplications().size() 0) { +for (ApplicationId appId : nodeStatus.getKeepAliveApplications()) { + ListLong list = keepAliveRequests.get(appId); + if (list == null) { +list = new LinkedListLong(); +keepAliveRequests.put(appId, list); + } + list.add(System.currentTimeMillis()); +} + } {code} {code} + if (heartBeatID == 2) { +LOG.info(Sending FINISH_APP for application: [ + appId + ]); +this.context.getApplications().put(appId, mock(Application.class)); + nhResponse.addAllApplicationsToCleanup(Collections.singletonList(appId)); + } {code} {code} + rt.context.getApplications().remove(rt.appId); {code} {code} +private MapApplicationId, ListLong keepAliveRequests = +new HashMapApplicationId, ListLong(); +private ApplicationId appId = BuilderUtils.newApplicationId(1, 1); {code} do we need this? can we remove all application related stuff? as we are now checking only heartbeat ids..we can remove this.. thoughts? Node Manager can not handle duplicate responses --- Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, YARN-245-trunk-3.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-245) Node Manager can not handle duplicate responses
[ https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713282#comment-13713282 ] Omkar Vinit Joshi commented on YARN-245: ignore thread comment ...realized it is L not 1 :D Node Manager can not handle duplicate responses --- Key: YARN-245 URL: https://issues.apache.org/jira/browse/YARN-245 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.2-alpha, 2.0.1-alpha Reporter: Devaraj K Assignee: Mayank Bansal Attachments: YARN-245-trunk-1.patch, YARN-245-trunk-2.patch, YARN-245-trunk-3.patch {code:xml} 2012-11-25 12:56:11,795 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: FINISH_APPLICATION at FINISHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75) at java.lang.Thread.run(Thread.java:662) 2012-11-25 12:56:11,796 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1353818859056_0004 transitioned from FINISHED to null {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster
[ https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713285#comment-13713285 ] Omkar Vinit Joshi commented on YARN-685: [~raviprak] can you please tell me what is value-1 and value-2?? I think first one is nodes..what is second? also what do you mean here? {code} For 23, Reduce: 2 1 32 2 1 4 {code} Capacity Scheduler is not distributing the reducers tasks across the cluster Key: YARN-685 URL: https://issues.apache.org/jira/browse/YARN-685 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.0.4-alpha Reporter: Devaraj K If we have reducers whose total memory required to complete is less than the total cluster memory, it is not assigning the reducers to all the nodes uniformly(~uniformly). Also at that time there are no other jobs or job tasks running in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-853) maximum-am-resource-percent doesn't work after refreshQueues command
[ https://issues.apache.org/jira/browse/YARN-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713311#comment-13713311 ] Hadoop QA commented on YARN-853: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593117/YARN-853-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1528//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1528//console This message is automatically generated. maximum-am-resource-percent doesn't work after refreshQueues command Key: YARN-853 URL: https://issues.apache.org/jira/browse/YARN-853 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.1.0-beta, 2.0.5-alpha Reporter: Devaraj K Assignee: Devaraj K Attachments: YARN-853-1.patch, YARN-853-2.patch, YARN-853-3.patch, YARN-853.patch If we update yarn.scheduler.capacity.maximum-am-resource-percent / yarn.scheduler.capacity.queue-path.maximum-am-resource-percent configuration and then do the refreshNodes, it uses the new config value to calculate Max Active Applications and Max Active Application Per User. If we add new node after issuing 'rmadmin -refreshQueues' command, it uses the old maximum-am-resource-percent config value to calculate Max Active Applications and Max Active Application Per User. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-875) Application can hang if AMRMClientAsync callback thread has exception
[ https://issues.apache.org/jira/browse/YARN-875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713315#comment-13713315 ] Hadoop QA commented on YARN-875: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12593101/YARN-875.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1529//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1529//console This message is automatically generated. Application can hang if AMRMClientAsync callback thread has exception - Key: YARN-875 URL: https://issues.apache.org/jira/browse/YARN-875 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.1.0-beta Reporter: Bikas Saha Assignee: Xuan Gong Attachments: YARN-875.1.patch, YARN-875.1.patch, YARN-875.2.patch, YARN-875.3.patch Currently that thread will die and then never callback. App can hang. Possible solution could be to catch Throwable in the callback and then call client.onError(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-713) ResourceManager can exit unexpectedly if DNS is unavailable
[ https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13713322#comment-13713322 ] Maysam Yabandeh commented on YARN-713: -- [~ojoshi], but the patch was ready since June 14! Anyway, feel free to take over. ResourceManager can exit unexpectedly if DNS is unavailable --- Key: YARN-713 URL: https://issues.apache.org/jira/browse/YARN-713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Maysam Yabandeh Priority: Critical Fix For: 2.1.0-beta Attachments: YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and that ultimately would cause the RM to exit. The RM should not exit during DNS hiccups. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira