kyungwan nam created YARN-9691:
----------------------------------

             Summary: canceling upgrade does not work if upgrade failed 
container is existing
                 Key: YARN-9691
                 URL: https://issues.apache.org/jira/browse/YARN-9691
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: kyungwan nam
            Assignee: kyungwan nam


if a container is failed to upgrade during yarn service upgrade, it will be 
released container and transition to FAILED_UPGRADE state.
After then, I expected it is able to be back to the previous version using 
cancel-upgrade. but, It didn’t work.
At that time, AM log is as follows

{code}
# failed to upgrade container_e62_1563179597798_0006_01_000008

2019-07-16 18:21:55,152 [IPC Server handler 0 on 39483] INFO  
service.ClientAMService - Upgrade container 
container_e62_1563179597798_0006_01_000008
2019-07-16 18:21:55,153 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] spec state state changed from 
NEEDS_UPGRADE -> UPGRADING
2019-07-16 18:21:55,154 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] Transitioned from READY to 
UPGRADING on UPGRADE event
2019-07-16 18:21:55,154 [pool-5-thread-4] INFO  
registry.YarnRegistryViewForProviders - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008]: Deleting registry path 
/users/test/services/yarn-service/sleeptest/components/ctr-e62-1563179597798-0006-01-000008
2019-07-16 18:21:55,156 [pool-6-thread-6] INFO  provider.ProviderUtils - 
[COMPINSTANCE sleep-0 : container_e62_1563179597798_0006_01_000008] version 
1.0.1 : Creating dir on hdfs: 
hdfs://test1.com:8020/user/test/.yarn/services/sleeptest/components/1.0.1/sleep/sleep-0
2019-07-16 18:21:55,157 [pool-6-thread-6] INFO  
containerlaunch.ContainerLaunchService - reInitializing container 
container_e62_1563179597798_0006_01_000008 with version 1.0.1
2019-07-16 18:21:55,157 [pool-6-thread-6] INFO  
containerlaunch.AbstractLauncher - yarn docker env var has been set 
{LANGUAGE=en_US.UTF-8, HADOOP_USER_NAME=test, 
YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_HOSTNAME=sleep-0.sleeptest.test.EXAMPLE.COM,
 WORK_DIR=$PWD, LC_ALL=en_US.UTF-8, YARN_CONTAINER_RUNTIME_TYPE=docker, 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=registry.test.com/test/sleep1:latest, 
LANG=en_US.UTF-8, YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK=bridge, 
YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true, LOG_DIR=<LOG_DIR>}
2019-07-16 18:21:55,158 
[org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #7] INFO  
impl.NMClientAsyncImpl - Processing Event EventType: REINITIALIZE_CONTAINER for 
Container container_e62_1563179597798_0006_01_000008
2019-07-16 18:21:55,167 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] spec state state changed from 
UPGRADING -> RUNNING_BUT_UNREADY
2019-07-16 18:21:55,167 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] retrieve status after 30
2019-07-16 18:21:55,167 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] Transitioned from UPGRADING to 
REINITIALIZED on START event
2019-07-16 18:22:07,797 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:22:07 KST 
2019", outcome="failure", message="Failure in Default probe: IP presence", 
exception="java.io.IOException: sleep-0: IP is not available yet"
2019-07-16 18:22:37,797 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:22:37 KST 
2019", outcome="failure", message="Failure in Default probe: IP presence", 
exception="java.io.IOException: sleep-0: IP is not available yet"
2019-07-16 18:23:07,797 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
Readiness check failed for sleep-0: Probe Status, time="Tue Jul 16 18:23:07 KST 
2019", outcome="failure", message="Failure in Default probe: IP presence", 
exception="java.io.IOException: sleep-0: IP is not available yet"
2019-07-16 18:23:08,225 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] spec state state changed from 
RUNNING_BUT_UNREADY -> FAILED_UPGRADE

# request canceling upgrade 

2019-07-16 18:28:22,713 [Component  dispatcher] INFO  service.ServiceManager - 
Upgrade container container_e62_1563179597798_0006_01_000004 true
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  service.ServiceManager - 
Upgrade container container_e62_1563179597798_0006_01_000003 true
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  service.ServiceManager - 
Upgrade container container_e62_1563179597798_0006_01_000008 true
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  service.ServiceManager - 
[SERVICE] spec state changed from UPGRADING -> CANCEL_UPGRADING
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep]: need upgrade to 1.0.0
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-0 : 
container_e62_1563179597798_0006_01_000008] spec state state changed from 
FAILED_UPGRADE -> NEEDS_UPGRADE
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep] Transitioned from UPGRADING to CANCEL_UPGRADING on 
CANCEL_UPGRADE event.
2019-07-16 18:28:22,713 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep1]: need upgrade to 1.0.0
2019-07-16 18:28:22,714 [Component  dispatcher] INFO  component.Component - 
[COMPONENT sleep1] Transitioned from UPGRADING to CANCEL_UPGRADING on 
CANCEL_UPGRADE event.
2019-07-16 18:28:22,714 [Component  dispatcher] INFO  
instance.ComponentInstance - container_e62_1563179597798_0006_01_000004 nothing 
to cancel
2019-07-16 18:28:22,714 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-2 : 
container_e62_1563179597798_0006_01_000004] spec state state changed from 
NEEDS_UPGRADE -> READY
2019-07-16 18:28:22,714 [Component  dispatcher] INFO  
instance.ComponentInstance - container_e62_1563179597798_0006_01_000003 nothing 
to cancel
2019-07-16 18:28:22,714 [Component  dispatcher] INFO  
instance.ComponentInstance - [COMPINSTANCE sleep-1 : 
container_e62_1563179597798_0006_01_000003] spec state state changed from 
NEEDS_UPGRADE -> READY
2019-07-16 18:28:22,714 [Component  dispatcher] ERROR service.ServiceScheduler 
- No component instance exists for container_e62_1563179597798_0006_01_000008

{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to