[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414543#comment-16414543
 ] 

Eric Yang commented on YARN-7973:
---------------------------------

[~shaneku...@gmail.com] Thank you for the example.  Container relaunch is kind 
of working on my cluster using the example above.  If an app is stopped, and 
restarted, new containers would be acquired.  If container fails, and the same 
one will be used for relaunch.  However, I encountered a problem where flexing 
containers from 2 to 3, then decrease back to 2.  The flexing command failed to 
be received by AM with the following error message:

{code}
[hbase@eyang-5 hadoop-3.2.0-SNAPSHOT]$ ./bin/yarn app -flex z1 -component ping 2
2018-03-26 20:37:22,968 ERROR client.ApiServiceClient: Fail to flex 
application: 
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
Connection refused (Connection refused)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
        at com.sun.jersey.api.client.Client.handle(Client.java:652)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at 
com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
        at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.actionFlex(ApiServiceClient.java:417)
        at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:519)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
        at sun.net.www.http.HttpClient.New(HttpClient.java:339)
        at sun.net.www.http.HttpClient.New(HttpClient.java:357)
        at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
        at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
        at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
        at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:238)
        at 
com.sun.jersey.api.client.CommittingOutputStream.commitStream(CommittingOutputStream.java:117)
        at 
com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
        at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
        at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
        at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
        at java.io.BufferedWriter.flush(BufferedWriter.java:254)
        at 
com.sun.jersey.core.util.ReaderWriter.writeToAsString(ReaderWriter.java:191)
        at 
com.sun.jersey.core.provider.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:128)
        at 
com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:88)
        at 
com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:58)
        at 
com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:217)
        at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
        ... 9 more
{code}

There is no error in AM logs.  The most recent logs are:

{code}
2018-03-26 20:43:32,061 [pool-5-thread-3] INFO  instance.ComponentInstance - 
[COMPINSTANCE ping-0 : container_1522094540915_0004_01_000014] IP = 
[172.26.111.20], host = ping-0.z1.hbase.ycluster, cancel container status 
retriever
2018-03-26 20:43:54,186 [pool-7-thread-1] INFO  component.Component - 
[COMPONENT ping] state changed from FLEXING -> STABLE
2018-03-26 20:43:54,187 [pool-7-thread-1] INFO  service.ServiceMaster - Service 
state changed from STARTED -> STABLE
2018-03-26 20:43:54,187 [pool-7-thread-1] INFO  instance.ComponentInstance - 
[COMPINSTANCE ping-0 : container_1522094540915_0004_01_000014] Transitioned 
from STARTED to READY on BECOME_READY event
{code}

The same commands works without this patch.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to