[ https://issues.apache.org/jira/browse/YARN-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014687#comment-17014687 ]
Eric Yang commented on YARN-9292: --------------------------------- [~ebadger] {quote}the image wouldn't have been pulled to that node before the task is run, right? That's my concern here.{quote} The concern for inconsistent docker image spread on the cluster is a valid one. There are two possibilities. Docker image exist on AM node, or it doesn't. # In the case where image exist on AM node, launching docker image using sha from AM node will result in a warning or failure. This depends on application anti-affinity policy. The error message of failed to launch docker container using sha signature should provide some clues to administrators to fix the docker images on other nodes. # If it is requesting an image that doesn't exist on the AM node, it will proceed with latest tag. It will have consistent images used if YARN-9184 is enabled. If YARN-9184 is turned off, it will follow the same pattern as 1. {quote}The command you ran doesn't even work for my version of Docker.{quote} I think my mouse cursor jumped when I copy and paste the information. I couldn't find where it changed the output. Your syntax is the correct one to use. Sorry for the confusion. {quote}Reading around on the internet, it looks like Docker takes the manifest sha and then recalculates the digest with some other stuff added on (maybe the tag data?) to get a new digest. I'm worried that this could break if we randomly choose the last sha. For example, maybe centos:7 is installed everywhere, but centos:latest is only installed on this one node by accident. If we grab the centos:latest sha, it won't work on the rest of the nodes in the cluster because the sha won't match the tag of the image on those nodes, even though they have the same manifest hash. Or maybe it only does the check based on the manifest hash. I can't seem to reproduce this with my version of Docker, so I can't test out what actually happens.{quote} When the list become multiple, they are pointed to the same image, just the repository id is different. At this time, using any of the repo digest id have the same out come. This was tested carefully before I go ahead with the implementation. This patch will impact the most when system admin does not use docker registry to manage docker images, and have inconsistent docker latest images sitting on nodes. They may get some extra nudge on launching application with inconsistent images with anti-affinity policy defined. Majority of users are not affected by this change. If AM picks an older image than latest on docker registry, the application docker images remain uniform. There is a possibility to have more of the same containers end up on the same node. However, this should be fine when user does not specify placement policy rules. I think this problem has been dissected to as small piece as possible, I haven't came up with more elegant solution to keep docker image consistent with latest tag and support both docker registry and without. Let me know if there is new ideas coming to mind. > Implement logic to keep docker image consistent in application that uses > :latest tag > ------------------------------------------------------------------------------------ > > Key: YARN-9292 > URL: https://issues.apache.org/jira/browse/YARN-9292 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Attachments: YARN-9292.001.patch, YARN-9292.002.patch, > YARN-9292.003.patch, YARN-9292.004.patch, YARN-9292.005.patch, > YARN-9292.006.patch, YARN-9292.007.patch, YARN-9292.008.patch > > > Docker image with latest tag can run in YARN cluster without any validation > in node managers. If a image with latest tag is changed during containers > launch. It might produce inconsistent results between nodes. This is surfaced > toward end of development for YARN-9184 to keep docker image consistent > within a job. One of the ideas to keep :latest tag consistent for a job, is > to use docker image command to figure out the image id and use image id to > propagate to rest of the container requests. There are some challenges to > overcome: > # The latest tag does not exist on the node where first container starts. > The first container will need to download the latest image, and find image > ID. This can introduce lag time for other containers to start. > # If image id is used to start other container, container-executor may have > problems to check if the image is coming from a trusted source. Both image > name and ID must be supply through .cmd file to container-executor. However, > hacker can supply incorrect image id and defeat container-executor security > checks. > If we can over come those challenges, it maybe possible to keep docker image > consistent with one application. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org