Oleg Zhurakousky created YARN-1847:
--------------------------------------

             Summary: YARN application always exits with FAILED state
                 Key: YARN-1847
                 URL: https://issues.apache.org/jira/browse/YARN-1847
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.3.0
            Reporter: Oleg Zhurakousky
            Priority: Critical


The _RMAppAttemptImpl_ creates an instance of ExpiredTransition which always 
sets the _finalAttemptState_ to FAILED.
{code}
private static final ExpiredTransition EXPIRED_TRANSITION =
      new ExpiredTransition();
. . .
    public ExpiredTransition() {
      super(RMAppAttemptState.FAILED);
    }
So, when my container successfully finishes regardless of the state (e.g., 
CONTAINER_FINISHED in my case), the _RMAppAttemptImpl.transition(..)_ does a 
switch on the _finalAttemptState_ and transitions to FAILED no matter what.
Here is the related logs for more info:
{code}
21:06:01,615  INFO AsyncDispatcher event handler container.Container:878 - 
Container container_1395104684413_0001_01_000001 transitioned from RUNNING to 
EXITED_WITH_SUCCESS
21:06:01,615  INFO AsyncDispatcher event handler launcher.ContainerLaunch:341 - 
Cleaning up container container_1395104684413_0001_01_000001
21:06:01,644  INFO DeletionService #0 nodemanager.DefaultContainerExecutor:369 
- Deleting absolute path : 
/Users/oleg/HADOOP_DEV/yarn-tutorial/target/oz.hadoop.StandAloneWithMiniYarnCluster/oz.hadoop.StandAloneWithMiniYarnCluster-localDir-nm-0_0/usercache/oleg/appcache/application_1395104684413_0001/container_1395104684413_0001_01_000001
21:06:01,646  INFO AsyncDispatcher event handler nodemanager.NMAuditLogger:89 - 
USER=oleg       OPERATION=Container Finished - Succeeded        
TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_1395104684413_0001    
CONTAINERID=container_1395104684413_0001_01_000001
21:06:01,649  INFO AsyncDispatcher event handler container.Container:878 - 
Container container_1395104684413_0001_01_000001 transitioned from 
EXITED_WITH_SUCCESS to DONE
21:06:01,649  INFO AsyncDispatcher event handler application.Application:339 - 
Removing container_1395104684413_0001_01_000001 from application 
application_1395104684413_0001
21:06:01,649  INFO AsyncDispatcher event handler 
monitor.ContainersMonitorImpl:159 - ResourceCalculatorPlugin is unavailable on 
this system. 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is disabled.
21:06:01,649  INFO AsyncDispatcher event handler 
containermanager.AuxServices:175 - Got event CONTAINER_STOP for appId 
application_1395104684413_0001
21:06:02,143  INFO Node Status Updater nodemanager.NodeStatusUpdaterImpl:374 - 
Removed completed container container_1395104684413_0001_01_000001
21:06:02,146  INFO ResourceManager Event Processor 
rmcontainer.RMContainerImpl:220 - container_1395104684413_0001_01_000001 
Container Transitioned from ACQUIRED to COMPLETED
21:06:02,146  INFO ResourceManager Event Processor fica.FiCaSchedulerApp:91 - 
Completed container: container_1395104684413_0001_01_000001 in state: COMPLETED 
event:FINISHED
21:06:02,146  INFO ResourceManager Event Processor 
resourcemanager.RMAuditLogger:98 - USER=oleg OPERATION=AM Released Container 
TARGET=SchedulerApp     RESULT=SUCCESS  APPID=application_1395104684413_0001    
CONTAINERID=container_1395104684413_0001_01_000001
21:06:02,146  INFO ResourceManager Event Processor fica.FiCaSchedulerNode:164 - 
Released container container_1395104684413_0001_01_000001 of capacity 
<memory:1024, vCores:1> on host 192.168.19.1:50787, which currently has 0 
containers, <memory:0, vCores:0> used and <memory:4096, vCores:8> available, 
release resources=true
21:06:02,146  INFO ResourceManager Event Processor fifo.FifoScheduler:790 - 
Application appattempt_1395104684413_0001_000001 released container 
container_1395104684413_0001_01_000001 on node: host: 192.168.19.1:50787 
#containers=0 available=4096 used=0 with event: FINISHED
21:06:02,146  INFO AsyncDispatcher event handler attempt.RMAppAttemptImpl:960 - 
Updating application attempt appattempt_1395104684413_0001_000001 with final 
state: FAILED
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to