[ 
https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014687#comment-17014687
 ] 

Eric Yang commented on YARN-9292:
---------------------------------

[~ebadger] {quote}the image wouldn't have been pulled to that node before the 
task is run, right? That's my concern here.{quote}

The concern for inconsistent docker image spread on the cluster is a valid one. 
 There are two possibilities.  Docker image exist on AM node, or it doesn't.  
# In the case where image exist on AM node, launching docker image using sha 
from AM node will result in a warning or failure.  This depends on application 
anti-affinity policy.  The error message of failed to launch docker container 
using sha signature should provide some clues to administrators to fix the 
docker images on other nodes.  
# If it is requesting an image that doesn't exist on the AM node, it will 
proceed with latest tag.  It will have consistent images used if YARN-9184 is 
enabled.  If YARN-9184 is turned off, it will follow the same pattern as 1.

{quote}The command you ran doesn't even work for my version of Docker.{quote}

I think my mouse cursor jumped when I copy and paste the information.  I 
couldn't find where it changed the output.  Your syntax is the correct one to 
use.  Sorry for the confusion.

{quote}Reading around on the internet, it looks like Docker takes the manifest 
sha and then recalculates the digest with some other stuff added on (maybe the 
tag data?) to get a new digest. I'm worried that this could break if we 
randomly choose the last sha. For example, maybe centos:7 is installed 
everywhere, but centos:latest is only installed on this one node by accident. 
If we grab the centos:latest sha, it won't work on the rest of the nodes in the 
cluster because the sha won't match the tag of the image on those nodes, even 
though they have the same manifest hash. Or maybe it only does the check based 
on the manifest hash. I can't seem to reproduce this with my version of Docker, 
so I can't test out what actually happens.{quote}

When the list become multiple, they are pointed to the same image, just the 
repository id is different.  At this time, using any of the repo digest id have 
the same out come.  This was tested carefully before I go ahead with the 
implementation.

This patch will impact the most when system admin does not use docker registry 
to manage docker images, and have inconsistent docker latest images sitting on 
nodes.  They may get some extra nudge on launching application with 
inconsistent images with anti-affinity policy defined.  Majority of users are 
not affected by this change.  If AM picks an older image than latest on docker 
registry,  the application docker images remain uniform.  There is a 
possibility to have more of the same containers end up on the same node.  
However, this should be fine when user does not specify placement policy rules.

I think this problem has been dissected to as small piece as possible, I 
haven't came up with more elegant solution to keep docker image consistent with 
latest tag and support both docker registry and without.  Let me know if there 
is new ideas coming to mind.

> Implement logic to keep docker image consistent in application that uses 
> :latest tag
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-9292
>                 URL: https://issues.apache.org/jira/browse/YARN-9292
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: YARN-9292.001.patch, YARN-9292.002.patch, 
> YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, 
> YARN-9292.006.patch, YARN-9292.007.patch, YARN-9292.008.patch
>
>
> Docker image with latest tag can run in YARN cluster without any validation 
> in node managers. If a image with latest tag is changed during containers 
> launch. It might produce inconsistent results between nodes. This is surfaced 
> toward end of development for YARN-9184 to keep docker image consistent 
> within a job. One of the ideas to keep :latest tag consistent for a job, is 
> to use docker image command to figure out the image id and use image id to 
> propagate to rest of the container requests. There are some challenges to 
> overcome:
>  # The latest tag does not exist on the node where first container starts. 
> The first container will need to download the latest image, and find image 
> ID. This can introduce lag time for other containers to start.
>  # If image id is used to start other container, container-executor may have 
> problems to check if the image is coming from a trusted source. Both image 
> name and ID must be supply through .cmd file to container-executor. However, 
> hacker can supply incorrect image id and defeat container-executor security 
> checks.
> If we can over come those challenges, it maybe possible to keep docker image 
> consistent with one application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to