[ https://issues.apache.org/jira/browse/YARN-5818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629691#comment-15629691 ]
Shane Kumpf commented on YARN-5818: ----------------------------------- Did some initial testing here and unfortunately, given that docker is a client/server model, when the docker daemon is down for restart/upgrade, client operations fail with an EOF exception. Our use of {{docker wait}} for retrieving the containers exit code breaks down as the client operation failures during the restart/upgrade. {code} An error occurred trying to connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/c11692777816e44049d610c4ad358a24eefbff707cdbd85c24df3d153c80401e/wait: EOF {code} The docker community believes this is working as intended and does not plan to fix this behavior. It appears we will have to handle retries in c-e. > Support the Docker Live Restore feature > --------------------------------------- > > Key: YARN-5818 > URL: https://issues.apache.org/jira/browse/YARN-5818 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn > Reporter: Shane Kumpf > > Docker 1.12.x introduced the docker [Live > Restore|https://docs.docker.com/engine/admin/live-restore/] feature which > allows docker containers to survive docker daemon restarts/upgrades. Support > for this feature should be added to YARN to allow docker changes and upgrades > to be less impactful to existing containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org