Oleg Zhurakousky created YARN-1847:
--------------------------------------
Summary: YARN application always exits with FAILED state
Key: YARN-1847
URL: https://issues.apache.org/jira/browse/YARN-1847
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Oleg Zhurakousky
Priority: Critical
The _RMAppAttemptImpl_ creates an instance of ExpiredTransition which always
sets the _finalAttemptState_ to FAILED.
{code}
private static final ExpiredTransition EXPIRED_TRANSITION =
new ExpiredTransition();
. . .
public ExpiredTransition() {
super(RMAppAttemptState.FAILED);
}
So, when my container successfully finishes regardless of the state (e.g.,
CONTAINER_FINISHED in my case), the _RMAppAttemptImpl.transition(..)_ does a
switch on the _finalAttemptState_ and transitions to FAILED no matter what.
Here is the related logs for more info:
{code}
21:06:01,615 INFO AsyncDispatcher event handler container.Container:878 -
Container container_1395104684413_0001_01_000001 transitioned from RUNNING to
EXITED_WITH_SUCCESS
21:06:01,615 INFO AsyncDispatcher event handler launcher.ContainerLaunch:341 -
Cleaning up container container_1395104684413_0001_01_000001
21:06:01,644 INFO DeletionService #0 nodemanager.DefaultContainerExecutor:369
- Deleting absolute path :
/Users/oleg/HADOOP_DEV/yarn-tutorial/target/oz.hadoop.StandAloneWithMiniYarnCluster/oz.hadoop.StandAloneWithMiniYarnCluster-localDir-nm-0_0/usercache/oleg/appcache/application_1395104684413_0001/container_1395104684413_0001_01_000001
21:06:01,646 INFO AsyncDispatcher event handler nodemanager.NMAuditLogger:89 -
USER=oleg OPERATION=Container Finished - Succeeded
TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1395104684413_0001
CONTAINERID=container_1395104684413_0001_01_000001
21:06:01,649 INFO AsyncDispatcher event handler container.Container:878 -
Container container_1395104684413_0001_01_000001 transitioned from
EXITED_WITH_SUCCESS to DONE
21:06:01,649 INFO AsyncDispatcher event handler application.Application:339 -
Removing container_1395104684413_0001_01_000001 from application
application_1395104684413_0001
21:06:01,649 INFO AsyncDispatcher event handler
monitor.ContainersMonitorImpl:159 - ResourceCalculatorPlugin is unavailable on
this system.
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
is disabled.
21:06:01,649 INFO AsyncDispatcher event handler
containermanager.AuxServices:175 - Got event CONTAINER_STOP for appId
application_1395104684413_0001
21:06:02,143 INFO Node Status Updater nodemanager.NodeStatusUpdaterImpl:374 -
Removed completed container container_1395104684413_0001_01_000001
21:06:02,146 INFO ResourceManager Event Processor
rmcontainer.RMContainerImpl:220 - container_1395104684413_0001_01_000001
Container Transitioned from ACQUIRED to COMPLETED
21:06:02,146 INFO ResourceManager Event Processor fica.FiCaSchedulerApp:91 -
Completed container: container_1395104684413_0001_01_000001 in state: COMPLETED
event:FINISHED
21:06:02,146 INFO ResourceManager Event Processor
resourcemanager.RMAuditLogger:98 - USER=oleg OPERATION=AM Released Container
TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1395104684413_0001
CONTAINERID=container_1395104684413_0001_01_000001
21:06:02,146 INFO ResourceManager Event Processor fica.FiCaSchedulerNode:164 -
Released container container_1395104684413_0001_01_000001 of capacity
<memory:1024, vCores:1> on host 192.168.19.1:50787, which currently has 0
containers, <memory:0, vCores:0> used and <memory:4096, vCores:8> available,
release resources=true
21:06:02,146 INFO ResourceManager Event Processor fifo.FifoScheduler:790 -
Application appattempt_1395104684413_0001_000001 released container
container_1395104684413_0001_01_000001 on node: host: 192.168.19.1:50787
#containers=0 available=4096 used=0 with event: FINISHED
21:06:02,146 INFO AsyncDispatcher event handler attempt.RMAppAttemptImpl:960 -
Updating application attempt appattempt_1395104684413_0001_000001 with final
state: FAILED
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)