[ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414543#comment-16414543 ]
Eric Yang commented on YARN-7973: --------------------------------- [~shaneku...@gmail.com] Thank you for the example. Container relaunch is kind of working on my cluster using the example above. If an app is stopped, and restarted, new containers would be acquired. If container fails, and the same one will be used for relaunch. However, I encountered a problem where flexing containers from 2 to 3, then decrease back to 2. The flexing command failed to be received by AM with the following error message: {code} [hbase@eyang-5 hadoop-3.2.0-SNAPSHOT]$ ./bin/yarn app -flex z1 -component ping 2 2018-03-26 20:37:22,968 ERROR client.ApiServiceClient: Fail to flex application: com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused (Connection refused) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155) at com.sun.jersey.api.client.Client.handle(Client.java:652) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539) at org.apache.hadoop.yarn.service.client.ApiServiceClient.actionFlex(ApiServiceClient.java:417) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:519) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111) Caused by: java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:463) at sun.net.www.http.HttpClient.openServer(HttpClient.java:558) at sun.net.www.http.HttpClient.<init>(HttpClient.java:242) at sun.net.www.http.HttpClient.New(HttpClient.java:339) at sun.net.www.http.HttpClient.New(HttpClient.java:357) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:238) at com.sun.jersey.api.client.CommittingOutputStream.commitStream(CommittingOutputStream.java:117) at com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at java.io.BufferedWriter.flush(BufferedWriter.java:254) at com.sun.jersey.core.util.ReaderWriter.writeToAsString(ReaderWriter.java:191) at com.sun.jersey.core.provider.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:128) at com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:88) at com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:58) at com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:217) at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153) ... 9 more {code} There is no error in AM logs. The most recent logs are: {code} 2018-03-26 20:43:32,061 [pool-5-thread-3] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1522094540915_0004_01_000014] IP = [172.26.111.20], host = ping-0.z1.hbase.ycluster, cancel container status retriever 2018-03-26 20:43:54,186 [pool-7-thread-1] INFO component.Component - [COMPONENT ping] state changed from FLEXING -> STABLE 2018-03-26 20:43:54,187 [pool-7-thread-1] INFO service.ServiceMaster - Service state changed from STARTED -> STABLE 2018-03-26 20:43:54,187 [pool-7-thread-1] INFO instance.ComponentInstance - [COMPINSTANCE ping-0 : container_1522094540915_0004_01_000014] Transitioned from STARTED to READY on BECOME_READY event {code} The same commands works without this patch. > Support ContainerRelaunch for Docker containers > ----------------------------------------------- > > Key: YARN-7973 > URL: https://issues.apache.org/jira/browse/YARN-7973 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Major > Attachments: YARN-7973.001.patch, YARN-7973.002.patch > > > Prior to YARN-5366, {{container-executor}} would remove the Docker container > when it exited. The removal is now handled by the > {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse > the workdir from the previous attempt, and does not call {{cleanupContainer}} > prior to {{launchContainer}}. The container ID is reused as well. As a > result, the previous Docker container still exists, resulting in an error > from Docker indicating the a container by that name already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org