[ 
https://issues.apache.org/jira/browse/YARN-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469061#comment-16469061
 ] 

Eric Yang commented on YARN-7654:
---------------------------------

[~jlowe]  [~Jim_Brennan] I misread the last message in the discussion forum.  
Logs feature can redirect stdout and stderr streams correctly.  However, I am 
not thrilled to call extra docker logs command to fetch logs, and maintaining 
the liveness of docker logs command.  In my view, this is more fragile because 
docker logs command can receive external signal to prevent the whole log to be 
sent to yarn, and subsequence tailing will report duplicated information.  If 
it is attached to the real stdout and stderr of the running program, we reduces 
the headache of additional process management and no duplicate information.

I don't believe blocking call is the correct answer to help determine liveness 
of docker container.  The blocking call to wait for docker detach has several 
problems: 1.  Docker run could get stuck in pull docker images when mass number 
of containers are all starting at the same time and image is not cached 
locally.  This happen a lot on repositories that are hosted on docker hub.  2.  
Docker run cli can also get stuck when docker daemon hangs, and no exit code is 
returned.  3.  Some docker image that are not built to run in detached mode.  
Some developer might have built their system to require foreground mode.  These 
images will terminate in detach mode.

When "docker run -d", and "docker logs" combination are employed, there is some 
progress are not logged.  i.e. the downloading progress, docker daemon error 
message.  The current patch would log any errors coming from docker run cli to 
provide more information for user who is troubleshooting the problems.

Regarding the racy problem, this is a problem that can be optimized by system 
administrator.  On a cluster that download all images from internet via a slow 
internet link.  It is perfectly reasonable to set the retry and timeout value 
to 30 minutes to wait for download to complete.  In highly automated system, 
such as a cloud vendor trying to spin up images in fraction of a second for 
mass number of user, the timeout value might be set to as short as 5 seconds.  
If the image came up in 6 seconds, and it missed the SLA, another container 
takes its place in the next 5 second to provide smooth user experience.  The 6 
seconds container is recycled and rebuilt.  At mass scale, race condition 
problem is easier to deal with than blocking call that prevent the entire 
automated system from working.
I can update the code to make retry configurable setting in the short term.

I am not discounting the possibilities to support docker run -d and docker 
logs, but this requires more development experiments to ensure all mechanic are 
covered well.  The current approach has been in use in my environment for the 
past 6 months, and it works well.  For 3.1.1 release, it would be safer to use 
the current approach to get us better coverage of the type of containers that 
can be supported.  Thoughts?

> Support ENTRY_POINT for docker container
> ----------------------------------------
>
>                 Key: YARN-7654
>                 URL: https://issues.apache.org/jira/browse/YARN-7654
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.1.0
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Blocker
>              Labels: Docker
>         Attachments: YARN-7654.001.patch, YARN-7654.002.patch, 
> YARN-7654.003.patch, YARN-7654.004.patch, YARN-7654.005.patch, 
> YARN-7654.006.patch, YARN-7654.007.patch, YARN-7654.008.patch, 
> YARN-7654.009.patch, YARN-7654.010.patch, YARN-7654.011.patch, 
> YARN-7654.012.patch, YARN-7654.013.patch, YARN-7654.014.patch, 
> YARN-7654.015.patch, YARN-7654.016.patch, YARN-7654.017.patch, 
> YARN-7654.018.patch, YARN-7654.019.patch, YARN-7654.020.patch, 
> YARN-7654.021.patch
>
>
> Docker image may have ENTRY_POINT predefined, but this is not supported in 
> the current implementation.  It would be nice if we can detect existence of 
> {{launch_command}} and base on this variable launch docker container in 
> different ways:
> h3. Launch command exists
> {code}
> docker run [image]:[version]
> docker exec [container_id] [launch_command]
> {code}
> h3. Use ENTRY_POINT
> {code}
> docker run [image]:[version]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to