[ https://issues.apache.org/jira/browse/YARN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313834#comment-14313834 ]
Hadoop QA commented on YARN-933: -------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697671/0004-YARN-933.patch against trunk revision b73956f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6573//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6573//console This message is automatically generated. > Potential InvalidStateTransitonException: Invalid event: LAUNCHED at > FINAL_SAVING > --------------------------------------------------------------------------------- > > Key: YARN-933 > URL: https://issues.apache.org/jira/browse/YARN-933 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.0.5-alpha > Reporter: J.Andreina > Assignee: Rohith > Attachments: 0001-YARN-933.patch, 0001-YARN-933.patch, > 0004-YARN-933.patch, YARN-933.3.patch, YARN-933.patch > > > am max retries configured as 3 at client and RM side. > Step 1: Install cluster with NM on 2 Machines > Step 2: Make Ping using ip from RM machine to NM1 machine as successful ,But > using Hostname should fail > Step 3: Execute a job > Step 4: After AM [ AppAttempt_1 ] allocation to NM1 machine is done , > connection loss happened. > Observation : > ========== > After AppAttempt_1 has moved to failed state ,release of container for > AppAttempt_1 and Application removal are successful. New AppAttempt_2 is > sponed. > 1. Then again retry for AppAttempt_1 happens. > 2. Again RM side it is trying to launch AppAttempt_1, hence fails with > InvalidStateTransitonException > 3. Client got exited after AppAttempt_1 is been finished [But actually job is > still running ], while the appattempts configured is 3 and rest appattempts > are all sponed and running. > RMLogs: > ====== > 2013-07-17 16:22:51,013 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1373952096466_0056_000001 State change from SCHEDULED to ALLOCATED > 2013-07-17 16:35:48,171 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: host-10-18-40-15/10.18.40.59:8048. Already tried 36 time(s); > maxRetries=45 > 2013-07-17 16:36:07,091 INFO > org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: > Expired:container_1373952096466_0056_01_000001 Timed out after 600 secs > 2013-07-17 16:36:07,093 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_1373952096466_0056_01_000001 Container Transitioned from ACQUIRED > to EXPIRED > 2013-07-17 16:36:07,093 INFO > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: > Registering appattempt_1373952096466_0056_000002 > 2013-07-17 16:36:07,131 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application appattempt_1373952096466_0056_000001 is done. finalState=FAILED > 2013-07-17 16:36:07,131 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Application removed - appId: application_1373952096466_0056 user: Rex > leaf-queue of parent: root #applications: 35 > 2013-07-17 16:36:07,132 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Application Submission: appattempt_1373952096466_0056_000002, > 2013-07-17 16:36:07,138 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1373952096466_0056_000002 State change from SUBMITTED to SCHEDULED > 2013-07-17 16:36:30,179 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: host-10-18-40-15/10.18.40.59:8048. Already tried 38 time(s); > maxRetries=45 > 2013-07-17 16:38:36,203 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: host-10-18-40-15/10.18.40.59:8048. Already tried 44 time(s); > maxRetries=45 > 2013-07-17 16:38:56,207 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1373952096466_0056_000001. Got exception: > java.lang.reflect.UndeclaredThrowableException > 2013-07-17 16:38:56,207 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > LAUNCH_FAILED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:630) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:495) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:476) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:662) > Client Logs > ======== > Caused by: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis > timeout while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=host-10-18-40-15/10.18.40.59:8020] > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:573) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534) > 2013-07-17 16:37:05,987 ERROR > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:Rex (auth:SIMPLE) cause:org.apache.hadoop.net.ConnectTimeoutException: > Call From HOST-10-18-91-55/10.18.40.57 to host-10-18-40-15:8020 failed on > socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: > 20000 millis timeout while waiting for channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=host-10-18-40-15/10.18.40.59:8020]; For more details see: > http://wiki.apache.org/hadoop/SocketTimeout -- This message was sent by Atlassian JIRA (v6.3.4#6332)